A diff is the output of a comparison between two versions of a text, showing what was added, removed, or changed. The term comes from the Unix diff command introduced in 1974. In software, a diff describes the minimum set of changes needed to transform one version of a file into another.

What is the Myers diff algorithm?

The Myers algorithm, published by Eugene W. Myers in 1986, finds the shortest edit script between two texts in O(ND) time, where N is the total length and D is the number of differences. It is the algorithm used by Git, GNU diff, and most professional diff tools because it is optimal for the common case of similar texts with few changes.

What is a patch file?

A patch file is a text file containing one or more diffs in a standardized format. It describes the changes needed to transform an original file into a revised version. The patch command (or git apply) can read a patch file and apply those changes to the original, reproducing the revised version without needing the revised file itself.

Diff & Text Comparison Glossary — Key Terms Explained

Diff: A diff is the output of comparing two versions of a text, showing what was added, removed, or changed. The term originates from the Unix diff command, first distributed in 1974 by Douglas McIlroy at Bell Labs. In modern usage, "diff" refers both to the tool that produces the comparison and to the output itself. A diff describes the minimum set of operations needed to transform one text into another — it does not store both full texts, only the changes.
Patch: A patch is a text file containing one or more diffs in a standardized format. It describes the exact changes needed to transform an original file into a revised version. The patch command (and git apply) can read a patch file and apply those changes to the original, reproducing the revised version without needing the complete revised file. Patches are commonly emailed between developers, submitted to open-source projects, or used to distribute security fixes.
Unified Diff: The unified diff format is the standard output format used by git diff, GNU diff with the -u flag, and most modern diff tools. It interleaves removed lines (prefixed with −) and added lines (prefixed with +) in a single column, surrounded by context lines (prefixed with a space). The header of each change section shows the line numbers from both the original and revised file, making it easy to locate the change. Unified diff is the format TextCompare uses in its unified view mode.
Edit Script: An edit script is a sequence of elementary operations — insertions and deletions of tokens — that transforms one text into another. The goal of a diff algorithm is to find the shortest edit script, meaning the one with the fewest operations. Shorter edit scripts are generally more useful because they represent changes that are more closely related to what a human actually edited. The Myers algorithm is defined precisely as finding the shortest edit script.
Myers Algorithm: The Myers diff algorithm, published by Eugene W. Myers in 1986, finds the shortest edit script between two sequences in O(ND) time and O(D) space, where N is the combined length of both sequences and D is the number of differences. It works by modeling the comparison as a path through an edit graph and searching for the shortest diagonal path (matches) rather than filling a complete dynamic programming matrix. The algorithm is used by Git, GNU diff, and TextCompare because it performs optimally when texts are similar — the common case for real document edits.
LCS — Longest Common Subsequence: The Longest Common Subsequence (LCS) of two sequences is the longest set of elements that appear in the same relative order in both sequences, but not necessarily contiguously. Finding the LCS is mathematically equivalent to finding the shortest edit script: the elements not in the LCS are the ones that must be deleted or inserted. Classic LCS algorithms run in O(N²) time and space, which is impractical for large files. The Myers algorithm achieves better performance by exploiting properties of the edit graph rather than computing the full LCS matrix.
Hunk: A hunk (also called a change block or diff chunk) is a contiguous group of changed lines in a diff, along with the surrounding context lines. In unified diff format, each hunk begins with a header line like @@ -5,7 +5,9 @@, which indicates the starting line numbers and lengths in the original and revised files. A diff can contain many hunks if changes occur in multiple non-adjacent locations in the file. Tools like Git use hunks as the unit for selective staging (git add -p).
Context Lines: Context lines are the unchanged lines shown above and below each changed section in a unified diff. By default, git diff shows 3 context lines on each side of a hunk. Context lines help readers understand what surrounds each change without needing to open the full file. They also allow the patch tool to locate the correct position in the original file even if the line numbers have shifted due to earlier edits. TextCompare renders context lines in the normal text color, visually separating them from the colored changed lines.
Added, Removed, Modified: These are the three fundamental change types in a diff. Added lines exist only in the revised text (shown in green with a + marker). Removed lines exist only in the original text (shown in red with a − marker). Modified is not a primitive operation in line-level diff — it is detected by TextCompare when a removed line and an added line are sufficiently similar, suggesting they represent an edit to the same line rather than an unrelated deletion and insertion. Modified lines trigger intra-line diff highlighting.
Similarity Score: A similarity score is a numeric measure of how alike two strings are, typically expressed as a percentage from 0% (completely different) to 100% (identical). TextCompare displays a similarity score for each comparison, calculated as the proportion of tokens shared between the two texts relative to their total length. A high similarity score with a small number of changes indicates minor revision; a low score indicates substantial rewriting. Similarity scores are useful for deciding which diff mode to use: closely similar texts benefit from intra-line character diff, while very different texts are easier to review at line level.
Intra-Line Diff: Intra-line diff (also called word diff or character diff within a line) is a second-pass diff that runs on the individual characters of two matched lines, highlighting exactly which characters were inserted or deleted. Without intra-line diff, a modified line is shown entirely in red and green — the reader cannot tell whether a single character or the entire line content changed. Intra-line diff adds a darker highlight to only the changed characters, making it easy to spot a single digit change or a spelling correction in a long line.
Side-by-Side Diff: A side-by-side diff (also called a split diff) displays the original text in a left panel and the revised text in a right panel, with corresponding lines aligned horizontally. Empty rows are inserted on one side when lines are added or deleted on the other, maintaining visual alignment. Side-by-side is generally preferred for interactive review sessions because both versions are visible simultaneously and the reader can easily look up context in either direction. TextCompare's default view is side-by-side on desktop.
Merge Conflict: A merge conflict occurs in version control when two branches both modify the same region of a file and the VCS cannot automatically determine which change to keep. The conflicted file is annotated with conflict markers: <<<<<<< HEAD shows the current branch's version, ======= separates the two versions, and >>>>>>> branch-name closes the other version. Resolving a merge conflict requires a developer to manually choose one version, combine them, or write a third alternative. A diff tool is useful for understanding exactly what each branch changed before resolving the conflict.
Text Tokenization: Tokenization is the process of splitting a text into discrete units (tokens) before running a diff algorithm. The choice of token granularity determines what counts as a single "unit of change." Line tokenization splits on newline characters — each line is one token. Word tokenization splits on whitespace and punctuation boundaries. Character tokenization treats each Unicode code point as a token. Coarser tokenization is faster and produces more readable output for large files; finer tokenization is slower but catches subtler changes. TextCompare offers all three modes and applies tokenization consistently to both inputs before diffing.
Edit Distance: Edit distance (also called Levenshtein distance when insertions, deletions, and substitutions are all counted) is the minimum number of single-character operations needed to transform one string into another. It is closely related to, but distinct from, the diff problem: diff finds the minimum edit script for sequences of lines or words, while Levenshtein distance typically operates on individual characters. TextCompare uses edit distance calculations internally to compute similarity scores and to decide whether two lines are similar enough to trigger intra-line diff highlighting rather than treating them as an unrelated deletion and addition.

See These Terms in Action

The best way to understand diff terminology is to try a real comparison. Paste any two texts and observe the diff output — then refer back to this glossary for any terms you encounter.

Open TextCompare How It Works