Email, URL & IP Extractor

Extract emails, URLs, IPv4, and IPv6 addresses from any text with deduplication and batch export.

Was this tool helpful?

How to Use

This tool scans any block of text and extracts email addresses, URLs, IPv4 addresses, and IPv6 addresses using pattern matching. All extraction runs locally in your browser with no server processing.

Paste your text into the input area, or click "Upload file" to load a .txt, .log, or .csv file. You can also click "Load Sample" for ready-made test data.
Toggle extraction types in the sidebar. Enable or disable Emails, URLs, IPv4, and IPv6 independently. Only enabled types are scanned.
Review grouped results below the input. Each type shows a count badge, a scrollable list of unique matches, and per-type copy and download buttons.
Copy or download individual groups or all results at once. The "Download All" button exports every group into a single TXT file with labeled sections.

About This Tool

Data extraction from unstructured text is one of the most common tasks in data processing, log analysis, and security auditing. This tool uses JavaScript regular expressions optimized for each data type, with built-in validation and deduplication.

Email Extraction (RFC 5322)

Email addresses are matched using a simplified version of the RFC 5322 specification. The pattern captures the local part (alphanumeric characters plus . _ % + -), the @ symbol, and a domain with at least one dot and a top-level domain of two or more characters. This covers the vast majority of real-world email addresses while avoiding false positives from edge cases like quoted local parts or IP-literal domains that rarely appear in practice. Deduplication is case-insensitive because RFC 5321 specifies that domain names are case-insensitive, and most email providers treat the local part the same way.

URL Detection

URLs are detected by matching http:// and https:// schemes followed by non-whitespace characters. The extractor strips trailing punctuation marks (periods, commas, parentheses) that commonly appear when URLs are embedded in prose or Markdown text. This prevents https://example.com. from including the sentence-ending period as part of the URL.

IPv4 Address Validation

IPv4 addresses are matched as four dot-separated groups of one to three digits. After pattern matching, each octet is validated to fall within the 0-255 range and to not have leading zeros (e.g., 192.168.001.001 is rejected because 001 is not a canonical octet representation). This two-phase approach — regex match followed by numeric validation — is more reliable than trying to encode all constraints in a single regular expression.

IPv6 Address Patterns

IPv6 address detection supports the full 8-group hexadecimal notation (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334) and the abbreviated double-colon notation (e.g., fe80::1 or ::1). The regex uses multiple alternation branches to handle the variable number of groups that appear before and after the :: abbreviation. Deduplication normalizes addresses to lowercase for consistent comparison.

Performance Characteristics

All extraction runs synchronously on the main thread using native JavaScript String.prototype.match() with compiled RegExp objects. Modern V8 and SpiderMonkey engines JIT-compile regex patterns, achieving throughput of roughly 100MB/s for simple patterns. A typical server log file of 1MB with thousands of IP addresses processes in under 100ms. The statistics ribbon shows exact processing time for every extraction.

Why Use This Tool

Extracting structured data from unstructured text is a recurring need across development, security, marketing, and operations. Here are the most common use cases:

Log analysis — Extract IP addresses from web server access logs, application logs, or firewall logs to identify traffic sources, detect brute-force attacks, or build IP allowlists and blocklists.
Data cleaning — Pull email addresses from unstructured customer feedback forms, support tickets, or CRM exports. Deduplication ensures each address appears only once for clean mailing lists.
Security auditing — Scan code repositories, configuration files, or documentation for accidentally committed email addresses, internal URLs, or server IP addresses that should not be public.
Content migration — Extract all URLs from a website's content dump to build a link inventory, check for broken links, or map redirects during domain migration.
Network inventory — Parse network device configurations (routers, switches, firewalls) to build an inventory of all IPv4 and IPv6 addresses in use across the infrastructure.
Marketing and outreach — Extract email addresses from business directories, event attendee lists, or published contact pages for targeted outreach campaigns.

Privacy

All extraction runs entirely in your browser using JavaScript regular expressions. No text is transmitted to any server. Your data never leaves your machine, making this safe for processing proprietary logs, internal network configurations, customer databases, or any sensitive data containing personal information.

FAQ

What regex patterns does this tool use to extract data?

Emails are matched using a simplified RFC 5322 pattern that captures local-part@domain.tld formats with alphanumeric characters, dots, underscores, percent signs, plus signs, and hyphens. URLs match http:// and https:// schemes followed by non-whitespace characters. IPv4 addresses match four dot-separated octets validated to 0-255. IPv6 addresses match eight colon-separated hex groups including common abbreviations with double-colon notation.

How does deduplication work?

Every extraction type uses a JavaScript Set to track unique values. Emails are compared case-insensitively since RFC 5321 specifies the domain part is case-insensitive and most providers treat the local part the same way. URLs and IP addresses are compared as exact strings. The results panel shows both total matches and unique counts.

Can it handle large text inputs like server logs?

Yes. The extractor processes text synchronously using native JavaScript regex which V8 and SpiderMonkey engines optimize heavily. A 1MB log file with thousands of matches typically processes in under 100 milliseconds. You can also upload .txt, .log, and .csv files directly instead of pasting.

Is my data sent to a server?

No. All extraction runs entirely in your browser using JavaScript regular expressions. No text is transmitted to any server. Your data never leaves your machine, making this safe for processing proprietary logs, internal emails, and sensitive network data.