How to use HTML Entity Encode / Decode
What it does & when you need it
You need HTML entity encoding whenever text has to appear literally inside an
HTML document instead of being interpreted as markup. Paste a code snippet, an
error message, or a chunk of user input into a page unescaped and the browser
will happily read a stray <script> as a real element. Encoding rewrites the
dangerous characters as entities so they render as text; decoding does the
reverse, turning <, é, or © back into the characters they
stand for. Both directions run entirely in your browser, so pasted markup and
customer data never leave your machine.
Reach for it when you are escaping content before dropping it into a template,
reading a scraped page whose text is full of & and , preparing a
code sample for a blog post, or debugging why a & in a query string turned
into &amp; somewhere in your pipeline.
How to use
- Pick Encode or Decode with the toggle in the toolbar.
- Paste your text into the left plain text / html buffer, press
Sample to load a realistic example, or Upload an
.htmlfile. - In Encode mode, tick Encode non-ASCII if you also want every character above code point 127 (accented letters, symbols, emoji) turned into numeric references — useful for ASCII-only channels.
- The result updates as you type in the right buffer. Press
Ctrl/Cmd+Enter(or the Copy result button) to copy it, and use Clear to reset the input.
Things worth knowing
HTML predefines only five escapes. Despite there being thousands of named
entities, the characters you actually need to escape in HTML text are just
&, <, >, ", and the apostrophe as '. This tool
deliberately emits ' rather than ': ' is defined by XML and
was not part of HTML 4, so it can fail to render in older or non-conforming
parsers, whereas the numeric reference is universally safe.
This is your first line of defence against XSS. Cross-site scripting —
stored or reflected — happens when attacker-controlled text is interpolated
into a page and the browser executes it as markup. Escaping < and & on
output, in the correct context, is what neutralises it. The key word is
output: encode at the moment you insert data into HTML, not when you store it,
and remember that attribute values, URLs, and inline scripts each need their own
escaping rules on top of this. For query strings specifically, reach for the
URL Encoder rather than HTML entities.
Numeric references come in two flavours for the same character. A code point
can be written in decimal, like ©, or in hexadecimal, like © —
both resolve to U+00A9, the copyright sign ©. The decoder here accepts either
form (and an uppercase &#X…;), so you can throw mixed content at it and get
consistent output.
Named entities are convenient but optional. Friendly names such as ,
—, and € are easier to read, but the full HTML named-character
list runs past 2,000 entries, and no tool memorises all of them. Numeric
references sidestep the lookup table entirely: any Unicode code point can be
written as &#N; and it will always decode, which is why encoders lean on them
for anything outside the common set. If you prefer to inspect characters as
\u-style escapes instead, the Unicode Escape tool
covers that representation, and the Base64 Encode / Decode tool
handles the binary-to-text case entirely differently.
A last gotcha: encoding is not idempotent. Because & is itself escaped to
&, running Encode twice turns < into &lt;. If you see doubled
entities in production output, something in the chain is escaping already-escaped
text — decode once and check where the extra pass came from.