UTF-8 Encoder / Decoder
Encode text to UTF-8 byte sequences or decode byte sequences back to text. View hex, decimal, binary, and percent-encoded formats.
What is UTF-8?
UTF-8 (Unicode Transformation Format - 8-bit) is the dominant character encoding on the internet, used by over 98% of web pages. It encodes Unicode characters using 1 to 4 bytes: ASCII characters use 1 byte, common European/Latin characters use 2 bytes, CJK characters use 3 bytes, and emoji typically use 4 bytes.
Understanding UTF-8 byte sequences is important for systems programming, network protocols, file format work, and debugging character encoding issues.
How to Use
- Choose Text → Bytes (encode) or Bytes → Text (decode)
- Select your preferred output format in Settings
- Paste or type your text
- Enable the byte table to see detailed character breakdowns
Frequently Asked Questions
Why does é use 2 bytes in UTF-8?
The letter é (e with accent) has Unicode code point U+00E9 (233 in decimal), which is above 127 and requires 2 bytes in UTF-8: C3 A9.
What is the difference between UTF-8 and ASCII?
ASCII uses 7 bits and only covers 128 characters (English alphabet, digits, symbols). UTF-8 is a superset: the first 128 characters are identical to ASCII, and it extends to over 1 million Unicode characters.
Why do emoji use 4 bytes?
Most emoji have Unicode code points above U+FFFF (the Basic Multilingual Plane limit), requiring 4 bytes in UTF-8 encoding.
What is the BOM in UTF-8?
The Byte Order Mark (BOM, U+FEFF) is sometimes added at the start of UTF-8 files as a signature. It appears as EF BB BF in hex. Most UTF-8 files should not have a BOM, as it can cause issues in some tools.
How does this differ from URL encoding?
URL encoding (% encoding) represents bytes as %XX in URLs. This tool shows the same bytes in multiple formats including hex (\xXX), decimal, binary, and percent-encoded.