How TextCompare Works
TextCompare finds the differences between two texts instantly, entirely in your browser. Here is a plain-English explanation of the algorithms and techniques behind each step — from raw input to highlighted output.
Five Steps from Input to Diff
Every comparison follows the same pipeline: tokenize, diff, highlight, render, and optionally share.
Split each text into a sequence of lines, words, or characters depending on the selected diff granularity.
Run the O(ND) Myers algorithm to find the shortest edit script — the minimum insertions and deletions needed.
For modified lines, run a second character-level pass to highlight exactly which characters changed.
Paint the result as side-by-side or unified panels with color-coded rows, line numbers, and change markers.
1. The Myers O(ND) Diff Algorithm
TextCompare's core engine is based on the Myers diff algorithm, introduced by Eugene W. Myers in his 1986 paper "An O(ND) Difference Algorithm and Its Variations." It is the same algorithm that powers git diff, GNU diff, and most professional code-review platforms.
The key insight is that any diff can be expressed as a path through an edit graph: a grid where each cell (x, y) represents the state after consuming x tokens from the original and y tokens from the revised text. A diagonal move means the tokens match (no edit needed); a right move means a deletion; a down move means an insertion.
Myers proves that the minimum edit distance is D (the number of non-diagonal moves), and the algorithm finds this path in O(ND) time — meaning it is extremely fast when the two texts are similar (small D), which is the common case for real document edits.
Why not Longest Common Subsequence (LCS)?
LCS-based diff algorithms run in O(N²) time and space, making them impractical for large files. Myers achieves O(ND) by searching diagonally outward from the start rather than filling an entire matrix. For typical document edits where less than 20% of lines change, Myers is 5–10× faster than LCS in practice.
2. Line, Word, and Character Tokenization
Before diffing, each text is split into a sequence of tokens. The granularity changes what the diff considers a single unit of comparison:
- Line mode — Split on newline characters. Each line is one token. This is the default and fastest mode. Identical lines are always treated as equal, even if they are long.
- Word mode — Tokenize using a regex that splits on whitespace boundaries. Punctuation attached to words is kept together. This mode is useful for prose documents where you care about individual word changes.
- Character mode — Each Unicode code point is a token. Most useful for short texts, passwords, or structured strings where even a single character matters.
TextCompare normalizes line endings (converting \r\n to \n) before tokenization. Optional pre-processing — trimming trailing whitespace, collapsing blank lines, lowercasing — is applied before tokenization when the corresponding options are checked.
3. Intra-Line Diff Highlighting
When two lines are matched by the diff (one in the original, one in the revised), TextCompare detects that they are similar but not identical and flags them as modified rather than a plain remove + add pair. For these line pairs, a second diff pass runs at character level.
The character-level pass uses the same Myers algorithm on the character sequences of both lines. The result is used to wrap changed characters in <del> (removed characters) and <ins> (added characters) elements, which are styled with subtle background highlights. This lets you spot a single changed word in a long line at a glance — a significant usability improvement over line-only diffs.
A similarity threshold controls when two lines are considered "modified" vs "completely replaced." If the character-level edit distance is greater than 60% of the longer line's length, the lines are treated as an unrelated removal and addition rather than a modification.
4. Side-by-Side vs Unified View
Side-by-side view places the original text on the left and the revised text on the right. Each line is assigned a line number from its respective source. Empty placeholder rows are inserted on the shorter side so that corresponding lines stay visually aligned. This view is ideal for reviewing changes when you have enough horizontal space.
Unified view interleaves removed and added lines in a single column, prefixed with − and + markers respectively. Both original and revised line numbers appear side-by-side in the gutter. This mirrors the output of git diff and is familiar to developers. It also works better on narrow screens and is easier to copy-paste into a bug report or email.
Both views support keyboard navigation: press n / p to jump to the next or previous change hunk, and ? to open the full keyboard shortcuts reference.
5. Share URL: Gzip + Base64 Encoding
When you click Share, TextCompare serializes both texts into a single JSON payload, compresses it using the gzip algorithm via the browser's built-in CompressionStream API, then encodes the compressed bytes as Base64url (URL-safe Base64 without padding). The result is appended to the page URL as a query parameter.
The recipient opens the URL, the browser decodes the Base64url string back to bytes, decompresses using DecompressionStream, and parses the JSON to restore both texts. The entire round-trip happens client-side — no server receives or stores the data.
Gzip typically achieves 60–80% compression on natural-language text, so a 10 KB pair of documents becomes roughly a 2–4 KB URL. Very large texts (hundreds of kilobytes) may produce URLs too long for some browsers or link-sharing services; in those cases TextCompare warns you and suggests downloading the diff instead.
Technical Note: Performance & Responsiveness
Large inputs can produce computationally expensive diffs. TextCompare applies two strategies to keep the browser UI responsive:
- Debouncing — The diff is only triggered after a 300 ms pause in typing, so the engine does not run on every keystroke for long inputs.
- scheduler.yield() — For very large edit scripts, the rendering loop yields control back to the browser event loop between rendering chunks via
scheduler.yield()(with asetTimeout(0)fallback). This prevents the page from freezing even on files with thousands of changed lines.
The diff computation itself is synchronous JavaScript — it does not use Web Workers — which keeps the implementation simple and avoids the serialization overhead of cross-thread message passing for typical document sizes.
Frequently Asked Questions
TextCompare uses the Myers O(ND) diff algorithm, first published by Eugene Myers in 1986. It finds the shortest edit script — the minimum number of insertions and deletions — needed to transform one text into another. This is the same algorithm used by Git and GNU diff, chosen because it is optimal for the common case of similar texts with relatively few changes.
No. All processing — tokenization, diff computation, highlighting, and rendering — runs entirely in your browser using JavaScript. Your text never leaves your device. There are no accounts, no server logs, and no data collection of any kind. This makes TextCompare safe to use with confidential contracts, source code, or personal documents.
Clicking Share gzip-compresses both texts using the browser's built-in CompressionStream API, encodes the result as Base64url, and appends it to the page URL. The recipient's browser decodes and decompresses the texts locally. No server is involved. Gzip typically reduces text by 60–80%, so even moderately long documents produce shareable URLs. Very large texts may exceed browser URL length limits — TextCompare will warn you if that happens.
Yes, with caveats. Inputs under 100 KB diff nearly instantly. Larger files are handled by debouncing the diff (waiting 300 ms after the last input change) and yielding the render loop via scheduler.yield() so the UI stays responsive. Files above 1 MB may take several seconds to compute on slower devices. If you are working with very large files regularly, line-mode diff (the default) is significantly faster than word or character mode.
See It in Action
The best way to understand how TextCompare works is to try it. Paste any two texts and watch the diff appear in real time.
Open the Compare Tool