How TextCompare Works — Text Diff Algorithm Explained

Q: What algorithm does TextCompare use to diff text?

TextCompare uses the Myers O(ND) diff algorithm, first published by Eugene Myers in 1986. It finds the shortest edit script — the minimum number of insertions and deletions — to transform one text into another. This is the same algorithm used by Git and many professional code-review tools.

Q: Does TextCompare send my text to a server?

No. All comparison logic runs entirely in your browser using JavaScript. Your text never leaves your device. This makes TextCompare completely private — no accounts, no logs, no data collection.

Q: How does the share URL work?

When you click Share, TextCompare gzip-compresses both texts, encodes the result as Base64url, and appends it to the URL as a query parameter. The recipient's browser decodes and decompresses the texts locally — no server involved.

Q: Can TextCompare handle large files?

Yes, with caveats. The diff computation is scheduled in small chunks using scheduler.yield() to avoid blocking the browser UI. Files under 100 KB compare instantly. Very large files (1 MB+) may take a few seconds but the UI remains responsive.

The Process

Five Steps from Input to Diff

Every comparison follows the same pipeline: tokenize, diff, highlight, render, and optionally share.

1

Tokenize

Split each text into a sequence of lines, words, or characters depending on the selected diff granularity.

2

Myers Diff

Run the O(ND) Myers algorithm to find the shortest edit script — the minimum insertions and deletions needed.

3

Intra-Line Diff

For modified lines, run a second character-level pass to highlight exactly which characters changed.

4

Render View

Paint the result as side-by-side or unified panels with color-coded rows, line numbers, and change markers.

1. The Myers O(ND) Diff Algorithm

TextCompare's core engine is based on the Myers diff algorithm, introduced by Eugene W. Myers in his 1986 paper "An O(ND) Difference Algorithm and Its Variations." It is the same algorithm that powers git diff, GNU diff, and most professional code-review platforms.

The key insight is that any diff can be expressed as a path through an edit graph: a grid where each cell (x, y) represents the state after consuming x tokens from the original and y tokens from the revised text. A diagonal move means the tokens match (no edit needed); a right move means a deletion; a down move means an insertion.

Myers proves that the minimum edit distance is D (the number of non-diagonal moves), and the algorithm finds this path in O(ND) time — meaning it is extremely fast when the two texts are similar (small D), which is the common case for real document edits.

Why not Longest Common Subsequence (LCS)?

LCS-based diff algorithms run in O(N²) time and space, making them impractical for large files. Myers achieves O(ND) by searching diagonally outward from the start rather than filling an entire matrix. For typical document edits where less than 20% of lines change, Myers is 5–10× faster than LCS in practice.

2. Line, Word, and Character Tokenization

Before diffing, each text is split into a sequence of tokens. The granularity changes what the diff considers a single unit of comparison:

Line mode — Split on newline characters. Each line is one token. This is the default and fastest mode. Identical lines are always treated as equal, even if they are long.
Word mode — Tokenize using a regex that splits on whitespace boundaries. Punctuation attached to words is kept together. This mode is useful for prose documents where you care about individual word changes.
Character mode — Each Unicode code point is a token. Most useful for short texts, passwords, or structured strings where even a single character matters.

TextCompare normalizes line endings (converting \r\n to \n) before tokenization. Optional pre-processing — trimming trailing whitespace, collapsing blank lines, lowercasing — is applied before tokenization when the corresponding options are checked.

3. Intra-Line Diff Highlighting

When two lines are matched by the diff (one in the original, one in the revised), TextCompare detects that they are similar but not identical and flags them as modified rather than a plain remove + add pair. For these line pairs, a second diff pass runs at character level.

The character-level pass uses the same Myers algorithm on the character sequences of both lines. The result is used to wrap changed characters in <del> (removed characters) and <ins> (added characters) elements, which are styled with subtle background highlights. This lets you spot a single changed word in a long line at a glance — a significant usability improvement over line-only diffs.

A similarity threshold controls when two lines are considered "modified" vs "completely replaced." If the character-level edit distance is greater than 60% of the longer line's length, the lines are treated as an unrelated removal and addition rather than a modification.

4. Side-by-Side vs Unified View

Side-by-side view places the original text on the left and the revised text on the right. Each line is assigned a line number from its respective source. Empty placeholder rows are inserted on the shorter side so that corresponding lines stay visually aligned. This view is ideal for reviewing changes when you have enough horizontal space.

Unified view interleaves removed and added lines in a single column, prefixed with − and + markers respectively. Both original and revised line numbers appear side-by-side in the gutter. This mirrors the output of git diff and is familiar to developers. It also works better on narrow screens and is easier to copy-paste into a bug report or email.

Both views support keyboard navigation: press n / p to jump to the next or previous change hunk, and ? to open the full keyboard shortcuts reference.

5. Share URL: Gzip + Base64 Encoding

When you click Share, TextCompare serializes both texts into a single JSON payload, compresses it using the gzip algorithm via the browser's built-in CompressionStream API, then encodes the compressed bytes as Base64url (URL-safe Base64 without padding). The result is appended to the page URL as a query parameter.

The recipient opens the URL, the browser decodes the Base64url string back to bytes, decompresses using DecompressionStream, and parses the JSON to restore both texts. The entire round-trip happens client-side — no server receives or stores the data.

Gzip typically achieves 60–80% compression on natural-language text, so a 10 KB pair of documents becomes roughly a 2–4 KB URL. Very large texts (hundreds of kilobytes) may produce URLs too long for some browsers or link-sharing services; in those cases TextCompare warns you and suggests downloading the diff instead.

Technical Note: Performance & Responsiveness

Large inputs can produce computationally expensive diffs. TextCompare applies two strategies to keep the browser UI responsive:

Debouncing — The diff is only triggered after a 300 ms pause in typing, so the engine does not run on every keystroke for long inputs.
scheduler.yield() — For very large edit scripts, the rendering loop yields control back to the browser event loop between rendering chunks via scheduler.yield() (with a setTimeout(0) fallback). This prevents the page from freezing even on files with thousands of changed lines.

The diff computation itself is synchronous JavaScript — it does not use Web Workers — which keeps the implementation simple and avoids the serialization overhead of cross-thread message passing for typical document sizes.

Questions

Frequently Asked Questions

What algorithm does TextCompare use to diff text?

Does TextCompare send my text to a server?

How does the share URL work?

Can TextCompare handle large files?

See It in Action

The best way to understand how TextCompare works is to try it. Paste any two texts and watch the diff appear in real time.

Open the Compare Tool