Encoding

  • Maps characters (text) to byte sequences and vice versa. Used for text fields in serialization.

Naming Confusion
  • When someone says "custom package encoding" , they usually mean:

    • A framing protocol  (how message start/end is delimited).

    • A custom serialization/deserialization  strategy.

    • A binary or textual format for transmitting structures over the network.

  • Using "encoding" for package framing strategies is technically valid but potentially ambiguous.

  • In networking, it’s better to use more specific terms.

  • The word "encoding" itself isn’t wrong but should be interpreted in the technical context.

  • In Odin, JSON and CBOR are considered "encoding" .

Text

UTF-8
  • Unicode Transformation Format – 8-bit

  • Size :

    • ASCII characters (0–127) use 1 byte

    • Non-ASCII characters use up to 4 bytes

    • For languages with many non-ASCII characters (e.g., Chinese, Japanese), it can take more space than UTF-16

  • Web standard (used by HTML, JSON, XML, etc.)

  • Backward compatible with ASCII; valid ASCII text is valid UTF-8

  • Serialization:

    • UTF-8 can be considered a form of serialization, specifically for binary text serialization

UTF-16
  • Size :

    • BMP characters (Basic Multilingual Plane, U+0000 to U+FFFF) use 2 bytes

    • Characters outside BMP (e.g., emojis, historical scripts) use 4 bytes (surrogate pairs)

    • More efficient for languages with many BMP characters (e.g., many Asian languages)

  • Widely used in some APIs and programming languages (e.g., Java, Windows, .NET)

UTF-32
  • Size : All characters are 4 bytes, making manipulation and indexing easier

ASCII
  • American Standard Code for Information Interchange

  • Legacy system compatibility : For old systems or devices that only support ASCII

  • Simple English text : When text contains only basic characters (A–Z letters, 0–9 digits, basic punctuation)

  • Simplicity : ASCII uses exactly 1 byte (8 bits) per character, simplifying processing in very basic systems