Named Entity Extractor

Extract named entities from text — dates, times, money, percentages, emails, URLs, phone numbers, and proper nouns. Color-coded inline highlighting with category breakdown — fully client-side.

Was this tool helpful?

How to Use

Extract entities from text in three steps:

Paste your text — Enter a news article, business report, scientific paper, or any text containing factual information. Click a sample button to try pre-loaded examples.
Select entity types — Toggle which categories to extract: dates, times, money, percentages, emails, URLs, phone numbers, quantities, and proper nouns. All types are enabled by default.
Review results — See color-coded inline highlighting in the annotated text, a categorized entity list, and a summary of counts per type. Export all entities as JSON or copy to clipboard.

About This Tool

Regex + Heuristic Detection

The extractor uses validated regular expressions for structured entities. Dates are detected in multiple formats: "March 15, 2024", "15 March 2024", "03/15/2024", and "2024-03-15". Monetary amounts handle seven currency symbols ($ € £ ¥ ₹) and spelled-out currencies. Phone numbers match US formats with or without country codes, and international formats starting with +.

Proper noun detection uses a capitalization heuristic: sequences of two or more capitalized words are flagged as name candidates. A filter of 100+ common sentence starters and abbreviations reduces false positives. This catches most person names ("John Smith"), organizations ("United Nations"), and locations ("New York City"). However, single-word names and names in unusual formats may be missed.

Named Entity Recognition (NER)

Named Entity Recognition is a fundamental task in Natural Language Processing. Production NER systems typically use machine learning models — particularly BERT-family transformer models fine-tuned on annotated corpora like CoNLL-2003 or OntoNotes. These models achieve F1 scores above 90% for standard entity types (PER, ORG, LOC, MISC) by understanding context: they can distinguish "Apple" (company) from "apple" (fruit) based on surrounding words.

This tool takes a lighter approach: regex patterns for structured data and heuristics for proper nouns. The advantage is zero latency, no model download, and complete transparency — you can see exactly which pattern matched. The trade-off is lower recall on ambiguous entities. For privacy-focused entity detection, see PII Detector.

JSON Export

Extracted entities can be exported as a JSON array with type, label, value, and character position (start/end offsets) for each entity. This format is compatible with standard NER annotation tools and can be used as training data or input to downstream pipelines. For text analysis, see Keyword Extractor and Sentiment Analyzer.

Why Use This Tool

Instant Extraction

All extraction runs in your browser with zero network calls. Regex patterns process text at native JavaScript speed — thousands of words in milliseconds. No model downloads, no API keys, no rate limits. Entity types can be toggled on or off for focused extraction.

Common Use Cases

Data extraction: Pull dates, amounts, and contact information from unstructured documents like invoices, contracts, and reports without manual reading.
Content analysis: Identify which people, organizations, and locations are mentioned in news articles or research papers for indexing and tagging.
Document preprocessing: Extract structured entities before feeding text to downstream tools like sentiment analysis or text classification.
Research annotation: Export entity positions as JSON for building labeled datasets or integrating with annotation pipelines.
Quality assurance: Verify that generated or translated text contains the expected entities (dates, amounts, names) from the source material.

Privacy

100% client-side processing. Your text never leaves your browser. Related tools: PII Detector, Keyword Extractor, Language Detector, and Text Diff.

FAQ

What types of entities does it detect?

Nine entity types: dates (multiple formats including 'March 15, 2024' and '2024-03-15'), times (12h and 24h), monetary amounts (with currency symbols and names), percentages, email addresses, URLs, phone numbers, numbers with units (km, million, GB), and multi-word proper nouns (person, organization, and place name candidates).

How does proper noun detection work?

The extractor identifies sequences of two or more capitalized words as proper noun candidates. It filters out common sentence starters ('The', 'However'), abbreviations ('Mr', 'Dr'), and single capitalized words to reduce false positives. This heuristic catches most person names ('John Smith'), organizations ('United Nations'), and places ('New York City') but may miss single-word entities.

How accurate is the detection?

Structured entities (dates, money, emails, URLs, phone numbers) are detected with high precision using validated regex patterns. Proper noun detection is heuristic-based and may produce false positives (capitalized words that aren't names) or false negatives (names in unusual formats). For ML-based NER with higher accuracy on names and organizations, transformer models are required.

Is my text sent to a server?

No. All entity extraction runs entirely in your browser using JavaScript regex patterns. No text is transmitted over the network.

Can I export the extracted entities?

Yes. Copy the entity summary or the full annotated text using the clipboard buttons. The JSON export includes entity type, value, and position for each detected entity.