Question 1

Which characters are encoded by default?

Accepted Answer

The five HTML/XML special characters: & becomes &, < becomes <, > becomes >, " becomes ", and the apostrophe becomes '. That is the minimal set required to keep a string safe inside HTML text content or attribute values. Every other character passes through unchanged, keeping your emoji, accented letters, and CJK text readable in the source.

Question 2

What does the "Encode all non-ASCII" option do?

Accepted Answer

With that toggle on, any character whose code point is above 127 gets replaced with a decimal numeric entity (&#code;). The output becomes pure ASCII, which is useful for contexts that are unreliable with UTF-8 - legacy mail transports, some SMS gateways, or systems that default to ISO-8859-1. The downside is the output is much less readable; turn it off if your pipeline handles UTF-8 end to end.

Question 3

Does the decoder handle named entities?

Accepted Answer

Yes. The decoder accepts every named character reference in the HTML Living Standard list, which is over 2,000 entries long. Common ones like ©, ®, €, , and — decode to their Unicode equivalents. It also tolerates a handful of legacy entities that worked without a trailing semicolon in old browsers, though the encoder always emits the semicolon-terminated form.

Question 4

Is this safe to use on untrusted input?

Accepted Answer

Encoding the five special characters is the foundation of XSS prevention, and this tool implements that encoding correctly. However, safe HTML output requires more than entity encoding - you also need to avoid dangerous attributes (javascript: URLs), script contexts, and unsafe uses of user input in inline event handlers. If you are handling untrusted content, do encoding at the output boundary in your web framework rather than as a copy-paste step.

Question 5

Is my text sent to a server?

Accepted Answer

No. The codec runs inside your browser tab as a Preact component and uses in-memory string operations only. There is no fetch call, no websocket, and no logging. People often test encoding on sensitive strings (API keys, internal URLs, personal data) and the local-only guarantee matters; you can verify with DevTools Network showing zero requests while you type.

Question 6

How are Unicode code points above U+FFFF encoded?

Accepted Answer

In numeric form they appear as a single decimal or hex reference - for example the pile of poo emoji 💩 is 💩 in decimal or 💩 in hex. JavaScript strings internally store these as UTF-16 surrogate pairs, but the encoder converts pairs to their original code point before emitting the entity. The decoder does the reverse, reassembling the surrogate pair on the way out.

Question 7

Can I use the output directly in an XML document?

Accepted Answer

The default five-entity encoding is exactly what XML 1.0 section 4.6 defines, so yes. If you encoded with the non-ASCII option on, numeric entities are also valid XML. Avoid named entities beyond the five predefined ones - ©, , and the rest are HTML-specific and an XML parser without a DTD will reject them.

Question 8

Why use ' instead of '?

Accepted Answer

Historical compatibility. The named entity ' is valid XML 1.0 and HTML5 but was not defined in HTML 4.01; older Internet Explorer versions and some email clients display it literally instead of decoding. The numeric form ' works everywhere that entities work, so the encoder uses it for the apostrophe by default.

Question 9

What about double-encoded text?

Accepted Answer

Double-encoding happens when text is encoded twice by accident - < for <. One Decode pass yields <; a second pass yields <. Run Decode repeatedly (the Swap button helps chain operations) until the output stops changing. The root cause is usually a web form that re-encodes data on submission; fix the pipeline rather than relying on manual decoding.

Question 10

How does HTML encoding differ from URL encoding?

Accepted Answer

They solve different problems. HTML encoding (this tool) makes text safe inside HTML element content or attribute values by replacing structural characters with entities. URL encoding (percent-encoding, RFC 3986) makes text safe inside a URL by replacing reserved characters with %XX sequences. A string inside a query parameter of an HTML link needs both - first URL-encoded to form a valid URL, then HTML-encoded so the & separators do not break the HTML. Use the URL Encoder/Decoder tool for the other direction.

HTML Entity Encode / Decode

How to Use the HTML Entity Encoder/Decoder

What the Codec Does and How

When You Need This

Gotchas

Spec Background

Similar Tools

Frequently Asked Questions

Related tools

More Developer Tools

AI Token Counter

Base64 Encoder & Decoder

Bulk URL Encode / Decode

chmod Calculator

Code Screenshot

Color Converter