How to Use the HTML to Markdown Converter
- Paste HTML into the input pane. A blog post copied from a CMS, an excerpt from a Notion export, a rendered email, or the HTML view of a documentation page all work.
- Watch the Markdown appear on the right as you type. Headings, inline formatting, links, images, code blocks, lists, and blockquotes are converted in one pass.
- Copy the Markdown using the clipboard button. The output is plain text that any CommonMark or GitHub Flavored Markdown parser accepts.
- Paste into your destination - a README, an Obsidian vault, a Hugo content file, or a GitHub issue. Preview in your target tool since flavor differences (tables, task lists) can affect rendering.
How the Conversion Works
The converter parses the pasted HTML with the browser\'s native DOMParser, which produces a real DOM tree rather than relying on brittle regex. A recursive walk visits each node, emits the appropriate Markdown token for its tag name, and recurses into children. Headings map to the ATX form (# through ######), <strong> and <b> become **bold**, <em> and <i> become *italic*, <a> becomes [text](url), and <img> becomes .
Code gets special handling. Inline <code> wraps in single backticks, but if the content itself contains a backtick the converter uses a longer fence (double or triple) per the CommonMark recommendation. <pre><code> blocks use triple-backtick fences with a language hint extracted from class names like language-python. Lists walk recursively so nested ordered and unordered lists indent correctly. Blockquotes prepend > to every line. <script> and <style> subtrees are skipped entirely because including their text would produce invalid Markdown.
When to Convert
- Migrating content out of a legacy CMS (WordPress, Drupal, Ghost) into a Git-based static site generator like Hugo or Astro.
- Copying an article from a web page into a Markdown note-taking app (Obsidian, Bear, Logseq) without losing structure.
- Converting a rich-text email into Markdown for pasting into GitHub issues or Linear comments.
- Extracting documentation from a rendered help-center page to seed a README.
- Preparing a reference snippet for a Stack Overflow answer where Markdown is the required input format.
- Feeding HTML into a Markdown-based CMS import tool that expects structured text rather than tag soup.
Edge Cases
- Nested inline formatting. Bold inside italics (
<em><strong>text</strong></em>) becomes***text***. Some parsers render this differently, so preview in the target engine. - Tables. HTML tables do not map cleanly to CommonMark, which has no table syntax. GFM tables are supported in many renderers; currently this converter produces a plain-text table fallback. Complex tables (merged cells, row spans) lose structure.
- Images with wrapping links.
<a><img></a>becomes[](url), which every Markdown renderer supports. - Inline HTML inside Markdown. CommonMark allows raw HTML blocks in Markdown output. If you need to preserve an iframe or a custom element, the converter outputs it literally rather than trying to translate it.
- Whitespace-sensitive content.
<pre>blocks preserve their content byte-for-byte, including leading tabs or trailing newlines. Those can change how later paragraphs render; inspect the output before pasting.
CommonMark in Context
Markdown was introduced by John Gruber in 2004 as a lightweight plain-text format. It was never rigorously specified, leading to incompatible implementations (Markdown.pl vs Pandoc vs Discount vs MultiMarkdown). CommonMark, stabilised in 2014 at commonmark.org, is the modern standards-based interpretation. GitHub Flavored Markdown (GFM) builds on CommonMark and adds tables, task lists, strikethrough, and autolinks - it is the flavor used in READMEs, issues, and pull requests on GitHub. This converter targets CommonMark as the lowest common denominator; GFM-specific extras that require HTML fallback render identically in CommonMark-only parsers.
Alternatives Worth Knowing
pandoc is the industrial-strength converter: pandoc -f html -t markdown input.html produces output that preserves more structure than any regex or DOM-walk approach, including footnotes, tables, and definition lists. turndown is the JavaScript equivalent and is excellent if you want to integrate conversion into a build script; it also handles custom rules for element-to-Markdown mapping. Browser extensions like Markdownload automate the "read this article, save as Markdown" workflow. Choose this in-browser tool when you need an immediate result without installing anything; choose pandoc when the input has unusual structure worth preserving precisely.
Frequently Asked Questions
Which Markdown flavor does the output target?
CommonMark, which is the modern standardised dialect documented at commonmark.org. Headings use ATX form (<code>#</code>), fenced code blocks use triple backticks, and link syntax matches the <code>[text](url)</code> form. GFM-specific features like task lists and tables are handled opportunistically but may render differently on strict CommonMark parsers. For maximum portability, keep your HTML structure simple and avoid relying on GFM extensions.
Does it use a real HTML parser or regex?
Real parser. The input is fed to the browser's <code>DOMParser</code> with the <code>text/html</code> MIME type, producing the same DOM tree that a browser would render. A depth-first walk then emits Markdown tokens for each element. That approach handles nested structures, implicit tag closure, and unusual attribute quoting correctly - situations where a regex-based converter would silently drop content or emit malformed output.
Are <code><script></code> and <code><style></code> blocks included in the Markdown?
No. The walker explicitly skips these elements because emitting their text would inject JavaScript or CSS into your Markdown, which every sane renderer treats as plain text - producing noisy paragraphs that clutter the content. This is especially useful when pasting an entire HTML page where the <code><head></code> contains analytics snippets you do not want in your README.
Is my content uploaded anywhere?
No. <code>DOMParser</code> is a synchronous in-process API, and the Markdown walk is a local function call. No fetch request is made during conversion, no websocket is opened, and nothing is persisted to localStorage or IndexedDB. The content you paste and the Markdown you copy both live in JavaScript memory only and are released when you close the tab.
How are tables handled?
Simple tables with <code><thead></code>, <code><tbody></code>, <code><tr></code>, and <code><td></code> convert to the GitHub Flavored Markdown table syntax with pipe separators and a dashed separator row. Tables that use row-spans or column-spans cannot be expressed in GFM syntax, so those lose the span and emit a flat grid. If your source has complex tables, consider keeping them as HTML inside the Markdown output (both CommonMark and GFM permit raw HTML blocks).
Does it preserve image alt text?
Yes. The <code>alt</code> attribute is read from each <code><img></code> and placed inside the <code></code> brackets. Images without alt text (accessibility anti-pattern, unfortunately common) emit <code></code>, which is valid Markdown but renders with no caption. Add meaningful alt text after conversion if it was missing in the source.
What happens with inline links containing query strings or parentheses?
CommonMark requires URL-special characters inside parentheses to be either URL-encoded or wrapped in angle brackets. A link like <code>[click](https://a.com/(path))</code> is ambiguous to parsers; the converter URL-encodes the inner parentheses to <code>%28</code> and <code>%29</code>. Query strings with <code>?</code>, <code>&</code>, and <code>=</code> pass through unmodified because they are safe in the link syntax.
Can I convert an entire HTML page including head and nav?
Yes, but the output will contain everything including menus, footers, and metadata. If you want just the article body, paste only the relevant fragment - for example the contents of <code><article></code> or <code><main></code>. The Readability algorithm (used by Firefox Reader View) is the canonical way to extract article content; browser extensions like Markdownload combine that extraction with Markdown conversion in one step.
How are fenced code blocks labeled with a language?
The converter reads the class attribute on the <code><code></code> element inside a <code><pre></code>. Classes matching <code>language-xxx</code>, <code>lang-xxx</code>, or just <code>xxx</code> (common from highlighters like Prism, Shiki, and Rouge) are recognised. The language hint appears immediately after the opening triple backtick and lets syntax highlighters in your target renderer apply correct colouring.
Will nested lists indent correctly?
Yes. Each list level adds four spaces of indentation, which is the standard CommonMark rule for nested lists. Unordered lists use <code>- </code> markers and ordered lists use <code>1. </code> through <code>n. </code>. A nested ordered list inside an unordered list produces a structure like <code>- Item\\n 1. Sub item</code>, which CommonMark parsers render correctly.
More Text Tools
Binary to Text
Convert text to binary and binary back to text.
Open toolCase Converter
Convert text between UPPER, lower, Title, Sentence, camelCase, snake_case and more.
Open toolCharacter Counter
Count characters with platform-specific limits for Twitter, Instagram and more.
Open toolEmoji Picker & Search
Search and copy emojis by name or category.
Open toolFancy Text Generator
Generate stylish text with bubbles, squares, upside down and more for social media.
Open toolFind & Replace
Find and replace text with regex support and case-sensitive options.
Open tool