Question 1

Does the tool preserve the order of the first occurrence?

Accepted Answer

Yes, by default. The deduplication pass walks the input top-to-bottom and keeps the first time each line appears, skipping every later duplicate. This matches the behavior of awk !seen[$0]++ on the command line. Enabling the Sort option tells the tool to re-order the survivors alphabetically after deduplication, which discards the original order but is often what you want for audit-friendly output.

Question 2

Why do two lines look identical but still count as different?

Accepted Answer

The comparison is byte-exact unless you opt into normalization. Common culprits are a trailing space or tab, a CRLF line ending versus an LF line ending, a non-breaking space U+00A0 masquerading as a regular space, or a smart quote U+2019 replacing an ASCII apostrophe. Turn on Trim whitespace to ignore leading and trailing whitespace; run the input through the invisible-character detector if you suspect hidden Unicode characters.

Question 3

Does my list get uploaded for processing?

Accepted Answer

No. Deduplication is a JavaScript function call inside your browser tab. The Set data structure, string methods, and array operations all execute locally in V8, SpiderMonkey, or JavaScriptCore depending on your browser. There is no fetch, no worker, no analytics pixel capturing lines, and closing the tab releases the strings to garbage collection. You can disconnect from the network after the page loads and keep deduplicating.

Question 4

Can I deduplicate only specific columns of a CSV?

Accepted Answer

Not directly - this tool treats each line as a single opaque key. For multi-column CSV dedup you have two options: preprocess the CSV to extract the key column into its own file first, or use a dedicated tool like csvkit's csvdedupe -c column-name, or import the CSV into SQLite and run SELECT DISTINCT col FROM t. For simple cases where the whole row is the key, this tool works perfectly.

Question 5

How does sort order work for emoji and non-Latin text?

Accepted Answer

The default sort uses Array.prototype.sort without a comparator, which compares UTF-16 code units. That gives you code-point order, not linguistic order: uppercase before lowercase, ASCII before accented characters, and emoji clustered together by their Unicode block assignment. For locale-aware sorting that handles German umlauts, Czech accents, or Chinese pinyin, use the Text Sorter tool on this site, which is built around Intl.Collator.

Question 6

What happens with a million-line input?

Accepted Answer

A million short lines deduplicate in well under a second on a modern laptop because the underlying Set uses a hash table. The browser may briefly pause while rendering the output textarea because that is the expensive step. If you routinely clean files that large, consider awk !seen[$0]++ on the terminal - it streams rather than loading everything into memory at once.

Question 7

How do I keep duplicates only and discard uniques?

Accepted Answer

This tool's output is the uniques set. For the inverse - finding lines that appeared more than once - use the POSIX tool uniq -d on a sorted file (sort file | uniq -d), or the awk one-liner awk '++count[$0] == 2' file. Many spreadsheet tools also have a duplicate-highlighting option under conditional formatting that is helpful for investigating why duplicates appeared.

Question 8

Is there a difference between this and sort -u?

Accepted Answer

Semantically none if you turn on this tool's Sort option. But sort -u sorts first and then deduplicates, meaning the survivors are in sorted order and the first-seen guarantee is lost. This tool preserves first-occurrence order by default, which is what awk !seen[$0]++ provides on the shell. Choose whichever ordering matches your downstream consumer's expectation.

Question 9

What if my lines have Unicode composition differences?

Accepted Answer

A character like e-with-acute can be encoded as the single code point U+00E9 (NFC form) or as e followed by the combining acute accent U+0301 (NFD form). They render identically but have different byte sequences, so default comparison treats them as distinct. If you are mixing inputs from different sources, run them through String.prototype.normalize("NFC") first; this tool does not normalize automatically because normalization can itself be a source of surprise.

Question 10

Does trim mode affect the output itself?

Accepted Answer

No. Trim mode only affects comparison - the output preserves the original line exactly as pasted, including any leading or trailing whitespace. So two lines "hello" and "hello   " compare as duplicates with trim on, and whichever appeared first in the input is the one written to the output, whitespace and all. If you want the output trimmed as well, follow up with a pass through the Whitespace Remover tool.

Remove Duplicate Lines

How to Use the Duplicate Line Remover

How the Deduplication Works

When to Reach for This

Common Pitfalls and Edge Cases

Uniqueness as a Set Operation

Comparison to Alternatives

Frequently Asked Questions

Related tools

More Text Tools

Binary to Text

Case Converter

Character Counter

Emoji Picker & Search

Fancy Text Generator

Find & Replace