Remove Duplicate Lines
Remove duplicate lines from text with case-sensitive and sorting options.
Reviewed by Aygul Dovletova · Last reviewed
How to Use the Duplicate Line Remover
- Paste your list into the input area - one entry per line. CSV rows, emails, log entries, URLs, or any newline-separated content are all fine.
- Toggle the options that match your data: Case-sensitive keeps
Useranduserdistinct, Trim whitespace ignores leading and trailing spaces during comparison, Sort output rearranges the survivors alphabetically. - Read the stats card that appears above the output: original line count, unique line count, and the number of duplicates discarded.
- Copy the cleaned output with the Copy button. The result preserves original order by default - the first occurrence wins and later copies are dropped.
- Paste the result back into your spreadsheet, database import, or text editor. The tool does not modify the original input so you can adjust options and re-run as many times as needed.
How the Deduplication Works
The tool splits the input on \\n, then walks the resulting array while building a JavaScript Set keyed by the normalized form of each line. Normalization is cheap: a toLowerCase call when case-insensitive mode is on, and a trim call when trim mode is on. The first time a key appears in the Set the original (un-normalized) line is pushed into the output array; every later occurrence is skipped. This gives first-seen-wins semantics with O(n) time and O(n) memory, where n is the number of lines. Sorting, when enabled, runs Array.prototype.sort on the survivors using the default code-point comparator, which is deterministic and locale-agnostic. The whole transformation runs synchronously in the browser tab - no worker, no server roundtrip.
When to Reach for This
- Cleaning a newsletter-subscriber export before uploading to your ESP so duplicates do not inflate your unsubscribe rate.
- Deduplicating a list of domains or IPs pulled from several log sources before piping into a firewall deny-list.
- Tidying a shopping-cart product list where scrapers sometimes emit the same SKU twice.
- Collapsing a sorted-but-noisy
journalctldump so repeated error lines do not drown the signal. - Preparing a wordlist for a security test where only unique candidates are useful to the cracker.
- Consolidating CSV rows exported from two different dashboards where the primary key is identical but the rows appear twice.
Common Pitfalls and Edge Cases
- Trailing whitespace differences. Two lines that look identical on screen can differ by a trailing space or tab, which by default keeps them apart. Enable Trim whitespace to treat them as duplicates.
- Smart quotes and soft hyphens. Copy-paste from a word processor often introduces U+2019 (curly apostrophe) or U+00AD (soft hyphen) that make two lines compare as different. The invisible-character detector on this site will reveal them.
- Mixed line endings. DOS-style CRLF lines will have a trailing
\\rthat makes them differ from LF-only duplicates. Enable trim mode or normalize the input first. - Sorted output changes the first-seen order. If you enable sort, the "first occurrence wins" guarantee is preserved during deduplication but the survivors are then reordered alphabetically.
- Non-ASCII sort order. Default sort uses UTF-16 code units, so Unicode characters land in code-point order rather than linguistic order. For locale-aware sorting use the Text Sorter tool on this site.
- Blank lines count. Empty lines are treated as a line with empty content; multiple blanks are collapsed to a single blank under deduplication.
Uniqueness as a Set Operation
What you are computing here is the mathematical set from a multiset: given a sequence with repetition, produce the underlying collection of distinct members. In relational databases this is SELECT DISTINCT. In shell this is the combination sort | uniq - the sort step is required because uniq only collapses adjacent duplicates. The more efficient awk '!seen[$0]++' idiom does exactly what this tool does: a hash set keyed by the line, printing each line the first time its key is unseen, in streaming first-seen order. That one-liner has become a staple of Unix toolchains because it preserves order without a full sort. POSIX defines uniq in IEEE Std 1003.1, and the -u, -d, and -c flags let you select only uniques, only duplicates, or show counts respectively.
Comparison to Alternatives
sort -u file is the canonical Unix one-liner and finishes in O(n log n) time with locale-aware collation if your LC_ALL is set. awk '!seen[$0]++' runs in O(n) but uses memory proportional to the number of distinct lines, which is only an issue on multi-gigabyte inputs. For CSV files where duplicates are defined by specific columns rather than the entire line, csvkit's csvdedupe or a SQL import with a DISTINCT ON clause are the right fit. Excel has a Remove Duplicates button under the Data tab that handles multi-column keys with a nice UI. Use this web tool when your list is short-to-medium (up to a few hundred thousand lines), you want to toggle case-insensitivity or trim without crafting a regex, and you do not want to open a terminal - especially on a locked-down work laptop.
Frequently Asked Questions
Does the tool preserve the order of the first occurrence?
Yes, by default. The deduplication pass walks the input top-to-bottom and keeps the first time each line appears, skipping every later duplicate. This matches the behavior of awk !seen[$0]++ on the command line. Enabling the Sort option tells the tool to re-order the survivors alphabetically after deduplication, which discards the original order but is often what you want for audit-friendly output.
Why do two lines look identical but still count as different?
The comparison is byte-exact unless you opt into normalization. Common culprits are a trailing space or tab, a CRLF line ending versus an LF line ending, a non-breaking space U+00A0 masquerading as a regular space, or a smart quote U+2019 replacing an ASCII apostrophe. Turn on Trim whitespace to ignore leading and trailing whitespace; run the input through the invisible-character detector if you suspect hidden Unicode characters.
Does my list get uploaded for processing?
No. Deduplication is a JavaScript function call inside your browser tab. The Set data structure, string methods, and array operations all execute locally in V8, SpiderMonkey, or JavaScriptCore depending on your browser. There is no fetch, no worker, no analytics pixel capturing lines, and closing the tab releases the strings to garbage collection. You can disconnect from the network after the page loads and keep deduplicating.
Can I deduplicate only specific columns of a CSV?
Not directly - this tool treats each line as a single opaque key. For multi-column CSV dedup you have two options: preprocess the CSV to extract the key column into its own file first, or use a dedicated tool like csvkit's csvdedupe -c column-name, or import the CSV into SQLite and run SELECT DISTINCT col FROM t. For simple cases where the whole row is the key, this tool works perfectly.
How does sort order work for emoji and non-Latin text?
The default sort uses Array.prototype.sort without a comparator, which compares UTF-16 code units. That gives you code-point order, not linguistic order: uppercase before lowercase, ASCII before accented characters, and emoji clustered together by their Unicode block assignment. For locale-aware sorting that handles German umlauts, Czech accents, or Chinese pinyin, use the Text Sorter tool on this site, which is built around Intl.Collator.
What happens with a million-line input?
A million short lines deduplicate in well under a second on a modern laptop because the underlying Set uses a hash table. The browser may briefly pause while rendering the output textarea because that is the expensive step. If you routinely clean files that large, consider awk !seen[$0]++ on the terminal - it streams rather than loading everything into memory at once.
How do I keep duplicates only and discard uniques?
This tool's output is the uniques set. For the inverse - finding lines that appeared more than once - use the POSIX tool uniq -d on a sorted file (sort file | uniq -d), or the awk one-liner awk '++count[$0] == 2' file. Many spreadsheet tools also have a duplicate-highlighting option under conditional formatting that is helpful for investigating why duplicates appeared.
Is there a difference between this and sort -u?
Semantically none if you turn on this tool's Sort option. But sort -u sorts first and then deduplicates, meaning the survivors are in sorted order and the first-seen guarantee is lost. This tool preserves first-occurrence order by default, which is what awk !seen[$0]++ provides on the shell. Choose whichever ordering matches your downstream consumer's expectation.
What if my lines have Unicode composition differences?
A character like e-with-acute can be encoded as the single code point U+00E9 (NFC form) or as e followed by the combining acute accent U+0301 (NFD form). They render identically but have different byte sequences, so default comparison treats them as distinct. If you are mixing inputs from different sources, run them through String.prototype.normalize("NFC") first; this tool does not normalize automatically because normalization can itself be a source of surprise.
Does trim mode affect the output itself?
No. Trim mode only affects comparison - the output preserves the original line exactly as pasted, including any leading or trailing whitespace. So two lines "hello" and "hello " compare as duplicates with trim on, and whichever appeared first in the input is the one written to the output, whitespace and all. If you want the output trimmed as well, follow up with a pass through the Whitespace Remover tool.
More Text Tools
Binary to Text
Convert text to binary and binary back to text.
Open toolCase Converter
Convert text between UPPER, lower, Title, Sentence, camelCase, snake_case and more.
Open toolCharacter Counter
Count characters with platform-specific limits for Twitter, Instagram and more.
Open toolEmoji Picker & Search
Search and copy emojis by name or category.
Open toolFancy Text Generator
Generate stylish text with bubbles, squares, upside down and more for social media.
Open toolFind & Replace
Find and replace text with regex support and case-sensitive options.
Open tool