Text Processing Pipeline

Segmentation

  • Divides the input text into several substrings called “runs”.

  • Runs are sequences of characters that share a common 'direction' (left-to-right or right-to-left) as well as a common script.

  • A “hard line break”, like an explicit newline or carriage return character, signals the end of a run.

  • Segmentation only produces bounds between runs. It does not modify the input text in any way.

Shaping

  • Instead of drawing an ASCII / UTF-8 character directly, a shaping engine is used to perform operations that make the text compatible with various languages.

  • Operations :

    • Substitution :

      • Necessary for ligatures.

        • For Latin languages this is not necessary, but languages like Arabic, etc., are completely wrong without ligatures.

    • Reordering :

      • Important for some characters outside the Latin alphabet.

    • Positioning :

      • Depending on the context, characters may want to move slightly.

      • One example of this is Kerning .

        • Bringing one character closer to another is an example of this, being super important for cursive fonts.

      • Depends on directionality.

        • Latin:

          • Left to Right -> Top to Bottom.

        • Arabic:

          • Right to Left -> Top to Bottom.

        • Japanese (Hiragana, Katakana, Kanji):

          • Top to Bottom -> Right to Left.

  • Libs :

    • HarfBuzz.

    • Uniscribe.

Kerning

  • Adjustment of horizontal space between specific pairs of glyphs (for example, “A” followed by “V”).

  • Kerning tables in font files specify fine-tuning offsets so that letter combinations appear optically balanced.

  • Applying :

    • Shaping Engines :

      • It's the primary mechanism for applying kerning in many modern text rendering systems.

    • Manual :

      • A layout engine or rendering system can apply kerning adjustments independently of the shaping engine, either using font metrics directly or user-defined adjustments.

    • Design Tools :

      • Software like Adobe Illustrator or Figma can allow manual kerning (user overrides), separate from the shaping engine's automatic adjustments.

    • Web Browsers / CSS :

      • In web environments, the font-kerning  and letter-spacing  CSS properties may influence kerning behavior, even disabling shaping-based kerning.

  • Kerning Data can reside in :

    • GPOS  table :

      • More flexible, modern positioning.

      • OpenType.

    • kern  table

      • Legacy.

      • TrueType.

Rasterization

  • The process of converting a vector font to a bitmap font.

  • Challenges :

    • Hinting :

      • Adjusting glyph shapes to align with pixel grids for clarity at small sizes.

    • Anti-aliasing :

      • Smoothing jagged edges using grayscale or subpixel techniques.

    • DPI Scaling :

      • Scaling fonts appropriately for the screen’s physical density (dots per inch).

Layout

  • Text layout is the process of wrapping lines to fit a certain width, and optionally making the text “fit better” through a bunch of adjustments, like justification and hyphenation.

  • These adjustments are typically very dependent on the script being processed.

Hinting

  • Instructions embedded in an outline font (TrueType or PostScript) that tell the rasterizer how to align strokes to the pixel grid at small sizes.

  • Proper hinting reduces artifacts like uneven stroke widths or blurry edges when rendering on low-resolution screens.