Text Similarity Checker
Compare two texts using five similarity metrics: cosine similarity, Jaccard index, Sørensen-Dice, Levenshtein distance, and LCS ratio. Sentence alignment and word overlap analysis — fully client-side.
How to Use
Compare two texts in three steps:
- Paste both texts — Enter the original in Text A and the comparison text in Text B. Click a sample button to try pre-loaded pairs: paraphrased texts, same-topic texts, or unrelated texts.
- Click Compare — The tool computes five similarity metrics simultaneously: Cosine Similarity, Jaccard Index, Sørensen-Dice Coefficient, Normalized Levenshtein Distance, and Longest Common Subsequence Ratio.
- Review results — See the overall weighted similarity score, individual metric breakdown with explanations, aligned sentence pairs, and shared/unique word analysis. Copy the summary with the clipboard button.
About This Tool
Five Complementary Metrics
No single metric captures all aspects of text similarity, so this tool computes five. Cosine Similarity represents each text as a term-frequency vector and measures the angle between them. It is length-independent — a 100-word and 1000-word text about the same topic score high. Jaccard Index is the ratio of shared unique words to total unique words. It penalizes vocabulary divergence more harshly than cosine.
Sørensen-Dice Coefficient operates on character bigrams (overlapping pairs of characters), making it sensitive to spelling variations and word order. Normalized Levenshtein Distance counts the minimum character edits (insertions, deletions, substitutions) to transform one text into the other, normalized by the longer text's length. Longest Common Subsequence (LCS) Ratio finds the longest word sequence common to both texts (not necessarily contiguous), which excels at detecting preserved passages with minor insertions.
Sentence Alignment
Beyond aggregate scores, the tool performs sentence-level alignment. Every sentence from Text A is compared against every sentence in Text B using cosine similarity. Pairs scoring above 30% are displayed as aligned matches, sorted by similarity. This reveals exactly which passages overlap — essential for identifying copied or paraphrased sections in academic, legal, or journalistic contexts.
Interpreting Scores
Scores above 80% indicate near-identical or verbatim-copied text. 50-80% suggests heavy paraphrasing or shared source material. 20-50% typically means same-topic or same-domain content. Below 20% indicates unrelated texts. The word overlap analysis shows which vocabulary is shared versus unique to each text, providing insight into how the similarity score was formed. For deeper text analysis, see Text Diff and Keyword Extractor.
Why Use This Tool
Instant Multi-Metric Comparison
All five metrics compute in your browser with zero network calls. The Levenshtein and LCS algorithms use optimized two-row dynamic programming for memory efficiency. Processing is near-instantaneous for texts up to several thousand words. No API keys, no rate limits, no usage quotas.
Common Use Cases
- Plagiarism screening: Compare student submissions, articles, or reports against known sources to detect copied or closely paraphrased content.
- Content deduplication: Identify near-duplicate articles, product descriptions, or knowledge base entries that should be merged or removed.
- Translation QA: Compare back-translated text against the original to assess translation fidelity at the vocabulary level.
- SEO content auditing: Check if pages on your site have too much content overlap, which can dilute search rankings. Google recommends unique content per URL.
- Writing revision tracking: Quantify how much a text changed between drafts by comparing revisions side by side.
Privacy
100% client-side processing. Both texts remain in your browser. Related tools: Text Diff, Readability Analyzer, Sentiment Analyzer, and Word Counter.