Email, URL & IP Extractor
Extract emails, URLs, IPv4, and IPv6 addresses from any text with deduplication and batch export.
How to Use
This tool scans any block of text and extracts email addresses, URLs, IPv4 addresses, and IPv6 addresses using pattern matching. All extraction runs locally in your browser with no server processing.
- Paste your text into the input area, or click "Upload file" to load a
.txt,.log, or.csvfile. You can also click "Load Sample" for ready-made test data. - Toggle extraction types in the sidebar. Enable or disable Emails, URLs, IPv4, and IPv6 independently. Only enabled types are scanned.
- Review grouped results below the input. Each type shows a count badge, a scrollable list of unique matches, and per-type copy and download buttons.
- Copy or download individual groups or all results at once. The "Download All" button exports every group into a single TXT file with labeled sections.
About This Tool
Data extraction from unstructured text is one of the most common tasks in data processing, log analysis, and security auditing. This tool uses JavaScript regular expressions optimized for each data type, with built-in validation and deduplication.
Email Extraction (RFC 5322)
Email addresses are matched using a simplified version of the RFC 5322 specification. The pattern captures the local part (alphanumeric characters plus . _ % + -), the @ symbol, and a domain with at least one dot and a top-level domain of two or more characters. This covers the vast majority of real-world email addresses while avoiding false positives from edge cases like quoted local parts or IP-literal domains that rarely appear in practice. Deduplication is case-insensitive because RFC 5321 specifies that domain names are case-insensitive, and most email providers treat the local part the same way.
URL Detection
URLs are detected by matching http:// and https:// schemes followed by non-whitespace characters. The extractor strips trailing punctuation marks (periods, commas, parentheses) that commonly appear when URLs are embedded in prose or Markdown text. This prevents https://example.com. from including the sentence-ending period as part of the URL.
IPv4 Address Validation
IPv4 addresses are matched as four dot-separated groups of one to three digits. After pattern matching, each octet is validated to fall within the 0-255 range and to not have leading zeros (e.g., 192.168.001.001 is rejected because 001 is not a canonical octet representation). This two-phase approach — regex match followed by numeric validation — is more reliable than trying to encode all constraints in a single regular expression.
IPv6 Address Patterns
IPv6 address detection supports the full 8-group hexadecimal notation (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334) and the abbreviated double-colon notation (e.g., fe80::1 or ::1). The regex uses multiple alternation branches to handle the variable number of groups that appear before and after the :: abbreviation. Deduplication normalizes addresses to lowercase for consistent comparison.
Performance Characteristics
All extraction runs synchronously on the main thread using native JavaScript String.prototype.match() with compiled RegExp objects. Modern V8 and SpiderMonkey engines JIT-compile regex patterns, achieving throughput of roughly 100MB/s for simple patterns. A typical server log file of 1MB with thousands of IP addresses processes in under 100ms. The statistics ribbon shows exact processing time for every extraction.
Why Use This Tool
Extracting structured data from unstructured text is a recurring need across development, security, marketing, and operations. Here are the most common use cases:
- Log analysis — Extract IP addresses from web server access logs, application logs, or firewall logs to identify traffic sources, detect brute-force attacks, or build IP allowlists and blocklists.
- Data cleaning — Pull email addresses from unstructured customer feedback forms, support tickets, or CRM exports. Deduplication ensures each address appears only once for clean mailing lists.
- Security auditing — Scan code repositories, configuration files, or documentation for accidentally committed email addresses, internal URLs, or server IP addresses that should not be public.
- Content migration — Extract all URLs from a website's content dump to build a link inventory, check for broken links, or map redirects during domain migration.
- Network inventory — Parse network device configurations (routers, switches, firewalls) to build an inventory of all IPv4 and IPv6 addresses in use across the infrastructure.
- Marketing and outreach — Extract email addresses from business directories, event attendee lists, or published contact pages for targeted outreach campaigns.
Privacy
All extraction runs entirely in your browser using JavaScript regular expressions. No text is transmitted to any server. Your data never leaves your machine, making this safe for processing proprietary logs, internal network configurations, customer databases, or any sensitive data containing personal information.