Keyword Density Checker
Analyze word frequency with single words, bigrams and trigrams with density percentages.
Maintained by Aygul Dovletova
Using the Keyword Density Checker
- Paste your draft content into the textarea. Plain text is easiest; HTML is stripped of tags before counting, so you can paste a rendered blog post straight from View Source.
- Optionally enter a specific keyword to spotlight - its exact frequency and density percentage appear at the top of the results.
- Review the three tables - unigrams (single words), bigrams (two-word phrases), trigrams (three-word phrases). Each row shows raw count and density as a percentage of total words.
- Read the signal, not the score - the goal is to spot over-optimization. Density above roughly 3-4% on a single phrase is a symptom of unnatural repetition; below 0.5% for your target topic may signal the page does not actually cover the term.
- Edit and rerun - the count updates on each submission, so you can trim or reword obvious stuffing and verify the density drops.
What This Tool Is For (and What It Is Not)
Keyword density is a diagnostic, not a target. It is useful as a symptom check: a paragraph that mentions "affordable dental implants Chicago" seven times in 400 words has an ~5.3% density that reads as unnatural to any editor and to modern search engines. The tool tokenises your input using String.prototype.split(/\\s+/), lowercases each token, strips punctuation, and builds frequency counts for unigrams, bigrams, and trigrams with basic stop-word filtering. The density formula is straightforward: (occurrences / total_tokens) * 100.
What it cannot tell you is whether your page will rank. Modern search engines moved past keyword-density ranking in roughly 2013 with Hummingbird and have since added BERT (2019), MUM (2021), and passage ranking - all of which model semantic intent rather than literal phrase counts. Use this tool to catch over-optimization and under-coverage; do not use it to hit a magic number. If a density report tells you "add more occurrences of your keyword to reach 2%," that is bad advice - follow the reverse read only.
When to Actually Run a Density Check
- After publishing a page that is ranking for the wrong query - often the density report reveals the page is more about a secondary phrase than the primary target.
- When editing a piece written by multiple authors, to catch inadvertent repetition where each author thought they needed to "establish" the topic.
- Auditing content flagged as "thin" by an SEO crawler (Screaming Frog, Sitebulb) - density combined with word-count gives a fuller picture than either alone.
- Reviewing AI-generated draft content for its tendency to repeat the prompt subject across paragraphs, which inflates unigram density well past natural levels.
- Comparing your draft to competitor pages that rank for the same query - not to match their density, but to spot topic areas they cover that you do not.
Edge Cases That Distort the Number
- Stop words inflate unigram lists - "the," "and," "of" dominate any English text. The tool filters a standard stop-word list, but the filter is language-specific; pasting Polish or Japanese gets unfiltered noise.
- Inflected forms count separately - "implant," "implants," and "implanted" are three distinct tokens to the tool. A light stemmer could merge them, but stemming distorts other statistics; the tool leaves inflection as-is.
- Brand names and product SKUs - counted as ordinary words, which can inflate density for branded content where your product name legitimately appears in every section.
- Hyphenated compounds - "long-tail keywords" counts as one token in most tokenisers and as three in others. This tool treats hyphens as intra-word characters, so "long-tail" is a single unigram.
- Code blocks, URLs, and citations - all counted. If your page has a pasted JSON schema or a reference list with DOIs, those tokens distort the density. Strip them before checking or mentally subtract.
- Very short content - densities on pieces under 300 words swing wildly with single-word edits, which makes the metric unreliable for snippet-length text.
How Modern Search Engines Actually Read Text
Google integrated BERT into Search in October 2019 and extended it globally that December. BERT is a transformer-based language model that analyses the relationship between words in a query rather than matching them as a bag of words. MUM, announced in May 2021, added cross-lingual and multimodal understanding, and passage ranking lets Google rank individual sections of a long page. Together these mean that repeating a keyword in every paragraph no longer correlates with relevance - what matters is whether the surrounding content actually discusses the concept the keyword represents. Google Search Central documentation on content quality explicitly warns against "writing lots of text on popular topics, hoping that some of it will perform well in search results."
Alternative Approaches to the Same Question
For "is my page well-optimized for query X," topical coverage is a better signal: does your page discuss the entities, subtopics, and related questions that top-ranking pages discuss? Tools like Clearscope, SurferSEO, and MarketMuse build topic models from SERP data and score drafts against them. For "is my page over-optimized," a density check is actually helpful, but so is a human re-read. For free alternatives, word-freq in Node.js or Python\'s collections.Counter give similar output. This tool wins for local analysis without uploading drafts to a third-party SEO service.
Frequently Asked Questions
What is a "good" keyword density?
There is no good number to target. The pragmatic interpretation is: above roughly 3-4% for a single phrase in body content usually reads as unnatural and can trigger quality flags; below 0.5% for the target topic may mean the page does not cover the term at all. Anything in between is normal and the specific value does not predict ranking. Optimize for readability and topical depth first; check density as a post-hoc sanity check.
Does Google penalize high keyword density?
Not density specifically, but keyword stuffing is explicitly listed as a spam violation in Google's Spam Policies documentation. Stuffing includes "lists of phone numbers without substantial added value," "blocks of text listing cities and regions," and "repeating the same words or phrases so often that it sounds unnatural." Density is how you measure that symptom; the underlying penalty is for unnatural content, not for the number.
What are bigrams and trigrams useful for?
They catch phrase-level repetition that unigram counts miss. "Affordable" and "dental" and "implants" might each appear 1.5% individually - reasonable - while "affordable dental implants" as a trigram appears 1.2%, which is extreme for a three-word exact match. Bigram and trigram tables surface the specific phrases that search engines and readers will flag as forced, even when no single word looks over-used.
Should I aim for a specific density on my target keyword?
No. The mental model "density X = ranking Y" is obsolete. Write the page to comprehensively cover the topic; use the primary keyword naturally in title, first paragraph, and one H2; then use the density check to confirm nothing is excessive. If you end up at 4% because you hammered it, rewrite.
Why do stop words dominate the unigram list?
Because English text is about 25-30% stop words by token count ("the," "a," "of," "and," "to," "in" alone account for roughly 17%). The tool filters a standard list of English stop words before ranking unigrams, which is why they do not appear in the result. For non-English content, the stop-word filter falls back to a generic list and may leave some high-frequency function words in the output.
Does the tool handle HTML?
Yes. It strips tags with a simple regex (<code><[^>]+></code> replaced with space) and normalizes whitespace before tokenising. This handles paste-from-View-Source workflows. What it does not do is parse semantically - hidden text inside <code>style</code> attributes or <code><noscript></code> blocks gets counted if present. For clean measurement, paste the rendered visible text rather than raw HTML.
Should keyword variations count as the same keyword?
Philosophically yes, mechanically no. "Implant," "implants," "implanted," and "implantation" are morphological variants of a single concept, but stemming to merge them introduces its own distortions - false matches across unrelated terms that happen to share a stem. This tool leaves inflected forms separate; you can mentally add related rows to estimate the true concept frequency.
Can I check density for multiple keywords at once?
Not in a single pass in this tool, but the bigram and trigram tables cover the most common multi-word cases automatically. For explicit multi-keyword checks, run the tool multiple times with each keyword in the spotlight field, or paste the text into a tool like SurferSEO or Clearscope that accepts a target keyword list. For ad-hoc use, rerunning is usually fast enough.
Does this tool store or transmit my content?
No. Tokenisation and counting happen in a Preact component running in your browser. The textarea value never leaves client-side state. You can verify via the Network tab in devtools - type, click Analyze, and watch for outbound requests; none are made. This matters for embargoed content, client drafts under NDA, and anything you would not want to leak.
What about LSI keywords?
"LSI keywords" is a term SEO marketers use loosely to mean "topically related terms." Google does not use LSI - John Mueller has confirmed this multiple times. What modern engines do use is semantic embeddings (BERT-style vector representations) where related terms share vector space. This density tool does not model embeddings; it counts tokens.
Is density the same across languages?
No. Agglutinative languages (Turkish, Finnish) pack more meaning per token, so natural density runs structurally lower than in English. When in doubt, compare to top-ranking native-language pages for the same query.
More SEO & Web Tools
Google SERP Preview
Preview how your page appears in Google search results with character count indicators.
Open toolHeading Structure Analyzer
Extract and visualize H1-H6 heading hierarchy with SEO issue detection.
Open toolHreflang Tag Generator
Generate hreflang link tags for multilingual websites with x-default support.
Open toolMeta Tag Generator
Generate complete HTML meta tags including Open Graph and Twitter Card tags.
Open toolOpen Graph Preview
Preview how your page looks when shared on Facebook and LinkedIn.
Open toolReadability Checker
Free reading level checker and writing grade level analyzer. Flesch-Kincaid, Flesch Reading Ease, Gunning Fog and Coleman-Liau scores in one place.
Open tool