Named Entity Extractor
Extract named entities from text — dates, times, money, percentages, emails, URLs, phone numbers, and proper nouns. Color-coded inline highlighting with category breakdown — fully client-side.
How to Use
Extract entities from text in three steps:
- Paste your text — Enter a news article, business report, scientific paper, or any text containing factual information. Click a sample button to try pre-loaded examples.
- Select entity types — Toggle which categories to extract: dates, times, money, percentages, emails, URLs, phone numbers, quantities, and proper nouns. All types are enabled by default.
- Review results — See color-coded inline highlighting in the annotated text, a categorized entity list, and a summary of counts per type. Export all entities as JSON or copy to clipboard.
About This Tool
Regex + Heuristic Detection
The extractor uses validated regular expressions for structured entities. Dates are detected in multiple formats: "March 15, 2024", "15 March 2024", "03/15/2024", and "2024-03-15". Monetary amounts handle seven currency symbols ($ € £ ¥ ₹) and spelled-out currencies. Phone numbers match US formats with or without country codes, and international formats starting with +.
Proper noun detection uses a capitalization heuristic: sequences of two or more capitalized words are flagged as name candidates. A filter of 100+ common sentence starters and abbreviations reduces false positives. This catches most person names ("John Smith"), organizations ("United Nations"), and locations ("New York City"). However, single-word names and names in unusual formats may be missed.
Named Entity Recognition (NER)
Named Entity Recognition is a fundamental task in Natural Language Processing. Production NER systems typically use machine learning models — particularly BERT-family transformer models fine-tuned on annotated corpora like CoNLL-2003 or OntoNotes. These models achieve F1 scores above 90% for standard entity types (PER, ORG, LOC, MISC) by understanding context: they can distinguish "Apple" (company) from "apple" (fruit) based on surrounding words.
This tool takes a lighter approach: regex patterns for structured data and heuristics for proper nouns. The advantage is zero latency, no model download, and complete transparency — you can see exactly which pattern matched. The trade-off is lower recall on ambiguous entities. For privacy-focused entity detection, see PII Detector.
JSON Export
Extracted entities can be exported as a JSON array with type, label, value, and character position (start/end offsets) for each entity. This format is compatible with standard NER annotation tools and can be used as training data or input to downstream pipelines. For text analysis, see Keyword Extractor and Sentiment Analyzer.
Why Use This Tool
Instant Extraction
All extraction runs in your browser with zero network calls. Regex patterns process text at native JavaScript speed — thousands of words in milliseconds. No model downloads, no API keys, no rate limits. Entity types can be toggled on or off for focused extraction.
Common Use Cases
- Data extraction: Pull dates, amounts, and contact information from unstructured documents like invoices, contracts, and reports without manual reading.
- Content analysis: Identify which people, organizations, and locations are mentioned in news articles or research papers for indexing and tagging.
- Document preprocessing: Extract structured entities before feeding text to downstream tools like sentiment analysis or text classification.
- Research annotation: Export entity positions as JSON for building labeled datasets or integrating with annotation pipelines.
- Quality assurance: Verify that generated or translated text contains the expected entities (dates, amounts, names) from the source material.
Privacy
100% client-side processing. Your text never leaves your browser. Related tools: PII Detector, Keyword Extractor, Language Detector, and Text Diff.