UTF-8 Encoder / Decoder

Encode text to UTF-8 byte sequences or decode byte sequences back to text. View hex, decimal, binary, and percent-encoded formats.

Live

Input Text

0 chars

UTF-8 (\xHH Hex)

What is UTF-8?

UTF-8 (Unicode Transformation Format - 8-bit) is the dominant character encoding on the internet, used by over 98% of web pages. It encodes Unicode characters using 1 to 4 bytes: ASCII characters use 1 byte, common European/Latin characters use 2 bytes, CJK characters use 3 bytes, and emoji typically use 4 bytes.

Understanding UTF-8 byte sequences is important for systems programming, network protocols, file format work, and debugging character encoding issues.

How to Use

Choose Text → Bytes (encode) or Bytes → Text (decode)
Select your preferred output format in Settings
Paste or type your text
Enable the byte table to see detailed character breakdowns

Frequently Asked Questions

Why does é use 2 bytes in UTF-8?

The letter é (e with accent) has Unicode code point U+00E9 (233 in decimal), which is above 127 and requires 2 bytes in UTF-8: C3 A9.

What is the difference between UTF-8 and ASCII?

ASCII uses 7 bits and only covers 128 characters (English alphabet, digits, symbols). UTF-8 is a superset: the first 128 characters are identical to ASCII, and it extends to over 1 million Unicode characters.

Why do emoji use 4 bytes?

Most emoji have Unicode code points above U+FFFF (the Basic Multilingual Plane limit), requiring 4 bytes in UTF-8 encoding.

What is the BOM in UTF-8?

The Byte Order Mark (BOM, U+FEFF) is sometimes added at the start of UTF-8 files as a signature. It appears as EF BB BF in hex. Most UTF-8 files should not have a BOM, as it can cause issues in some tools.

How does this differ from URL encoding?

URL encoding (% encoding) represents bytes as %XX in URLs. This tool shows the same bytes in multiple formats including hex (\xXX), decimal, binary, and percent-encoded.