ConvertCaseTool

UTF-8 Encoder / Decoder

Encode text to UTF-8 byte sequences or decode byte sequences back to text. View hex, decimal, binary, and percent-encoded formats.

Input Text
0 chars
UTF-8 (\xHH Hex)

What is UTF-8?

UTF-8 (Unicode Transformation Format - 8-bit) is the dominant character encoding on the internet, used by over 98% of web pages. It encodes Unicode characters using 1 to 4 bytes: ASCII characters use 1 byte, common European/Latin characters use 2 bytes, CJK characters use 3 bytes, and emoji typically use 4 bytes.

Understanding UTF-8 byte sequences is important for systems programming, network protocols, file format work, and debugging character encoding issues.

How to Use

  1. Choose Text → Bytes (encode) or Bytes → Text (decode)
  2. Select your preferred output format in Settings
  3. Paste or type your text
  4. Enable the byte table to see detailed character breakdowns

Frequently Asked Questions

Why does é use 2 bytes in UTF-8?

The letter é (e with accent) has Unicode code point U+00E9 (233 in decimal), which is above 127 and requires 2 bytes in UTF-8: C3 A9.

What is the difference between UTF-8 and ASCII?

ASCII uses 7 bits and only covers 128 characters (English alphabet, digits, symbols). UTF-8 is a superset: the first 128 characters are identical to ASCII, and it extends to over 1 million Unicode characters.

Why do emoji use 4 bytes?

Most emoji have Unicode code points above U+FFFF (the Basic Multilingual Plane limit), requiring 4 bytes in UTF-8 encoding.

What is the BOM in UTF-8?

The Byte Order Mark (BOM, U+FEFF) is sometimes added at the start of UTF-8 files as a signature. It appears as EF BB BF in hex. Most UTF-8 files should not have a BOM, as it can cause issues in some tools.

How does this differ from URL encoding?

URL encoding (% encoding) represents bytes as %XX in URLs. This tool shows the same bytes in multiple formats including hex (\xXX), decimal, binary, and percent-encoded.

🔗 Related Tools