Sans Bullshit Sans: leveraging the synergy of ligatures

February 26, 2015

Ever wonder what exactly is inside a font? We know it contains letters — or actually, drawings of letters, but what else is in there? Well, why not crack one open and see, and while we’re there, find a way to make this world a little less buzzwordy?

Laying out our plan

We’re going to create a font called Sans Bullshit Sans and it’ll do something like this:

A text full of buzzwords is being typed out, with all the buzzwords being replaced with a sign saying 'bullshit'

When using Sans Bullshit Sans, every buzzword will be replaced by a Comic Sans-styled censorship bar. And the cool part is, we won’t be using CSS or JavaScript for this, but plain ol’ OpenType font technology.

We’ll need three things for this project:

A font we can examine and edit
Tools that allow us to edit fonts with a texteditor
Our own ligatures

We’ll take care of step one right away. Since we’re more or less reverse-engineering a font, we need one that allows us to legally do this. I decided on the open source font Droid Sans, as its Apache license allows us to modify and redistribute it afterwards. Download it and save it to your favorite font hackin’ directory.

Picking our font hacking tools

Most font tools focus on the designey part of fonts: drawing the actual letter shapes, and tweaking their details like kerning and hinting. Think of them like Photoshop for fonts: very graphical and with quite a steep learning curve.

But there is another way to edit fonts. Once they have been saved as a TTF, they hold all their data in a special format. This is the OpenType format, in which each letter has its own occurrence in various tables, each describing a specific property of that letter.

For example, the glyf table holds information about the shape of the characters in a font — it actually stores a drawing for each character. Other tables hold different info: the name of the type’s designer, the maximum height of the font, or which combinations of letters should be replaced by a ligature.

Fonts are binary files, generally unsuitable for being picked apart by humans. So if we want to look at those tables, we better translate them into something more readable, like plain text. Luckily for us, there’s TTX/Fonttools, a command line tool that converts fonts to XML and vice versa. The XML document can be opened, read, and edited with your favorite text editor and is actually quite readable for us humans.

You can download the original version on Sourceforge, but I recommend a fork by Behdad Esfahbod. It’s actively maintained and gets bugfixes and new features (for instance, decoding of the new color tables. Follow either page’s installation instructions, or this detailed one for Windows.

Looking under Droid Sans’ hood

With TTX/Fonttools installed, let’s fire it up and have it decode Droid Sans:

$ ttx DroidSans.ttf

TTX/Fonttools translates all the binary data in DroidSans.ttf to XML. While it does so, the output looks a little like this:

Dumping "DroidSans.ttf" to "DroidSans.ttx"...
Dumping 'GlyphOrder' table...
Dumping 'head' table...
Dumping 'hhea' table...
Dumping 'maxp' table...
Dumping 'OS/2' table...
Dumping 'hmtx' table...
Dumping 'cmap' table...
Dumping 'fpgm' table...
Dumping 'prep' table...
Dumping 'cvt ' table...
Dumping 'loca' table...
Dumping 'glyf' table...
Dumping 'name' table...
Dumping 'post' table...
Dumping 'gasp' table...
Dumping 'FFTM' table...
Dumping 'GDEF' table...
Dumping 'GPOS' table...
Dumping 'GSUB' table...

You can see that besides the glyf table we talked about, there are about 18 other tables in Droid Sans. Some fonts have less, some more, but there’s always a basic set of tables that are needed for a valid font.

These tables hold every single detail about the font: a list which letters are in the font, the shapes of the letters, the name of the font — everything. Let’s take a quick look at Droid Sans’ head table, which contains global information about the font:

<head>
  <!-- Most of this table will be recalculated by the compiler -->
  <tableVersion value="1.0"/>
  <fontRevision value="1.0"/>
  <checkSumAdjustment value="0x2e9ffae7"/>
  <magicNumber value="0x5f0f3cf5"/>
  <flags value="00000000 00011111"/>
  <unitsPerEm value="2048"/>
  <created value="Mon May 17 19:56:38 2010"/>
  <modified value="Fri Jul  9 07:55:05 2010"/>
  <xMin value="-352"/>
  <yMin value="-492"/>
  <xMax value="1966"/>
  <yMax value="1907"/>
  <macStyle value="00000000 00000000"/>
  <lowestRecPPEM value="8"/>
  <fontDirectionHint value="2"/>
  <indexToLocFormat value="0"/>
  <glyphDataFormat value="0"/>
</head>

This table is required for all OpenType fonts, and contains info like the minimum and maximum size for all glyphs in the font, and the number of units per em, which is the grid the glyphs are drawn against.

For now we’re only interested in those units per em:

<unitsPerEm value="2048"/>

When we create our ligatures, we’ll use the same value. That way the dimensions of our characters, most importantly the vertical height, will match up precisely with that of Droid Sans.

For now we leave the TTX file for what it is. We’ll come back to it later when we add our actual ligatures.

Designing our ligatures

This is the plan: we have a list of buzzwords, and we’re going to (ab)use the OpenType feature of ligatures to replace those words with something of our own choosing: a sign that says “bullshit”:

Crude hand drawn black bar with the word 'bullshit' written over it

A ligature is when a string of characters (or a sequence of glyphs representing those characters) are replaced by one single glyph:

Ligatures improve legibility: instead of awkwardly overlapping parts of the letters (like the upper curve of the f and the dot of the i), the type designer can create a custom shape that looks a lot nicer. Another function of ligatures is joining letters in a different way in handwritten or brush typefaces. Fun fact: the well known ampersand, &, was originally a ligature for “et”, meaning “and” in Latin.

Not all fonts offer ligatures: it’s the choice of the type designer, or the distributor of the font, to create or include them. For instance, Google’s web versions are stripped of ligatures so they result in smaller files. Type aficionados might cringe, but it makes sense: ligature support in web browsers is still kinda shaky, so it would be a waste to have people download stuff they can’t use.

For ligatures to work, a font has to have a GSUB table (the Glyph Substitution table), which has entries about which combination of letters should be replaced by a ligature. It’ll say “for the letter combination f and i, show the fi-ligature.”

We’re going to use this table to turn words like “synergy” into a “bullshit”-ligature.

It’s probably worth noting that what we’re doing here is far from proper use of the ligature functionality. True type-folks will rightfully scoff at this juvenile misuse of a feature like this, but I thought it was a neat way to learn about how fonts work.

With that disclaimer out of the way, let’s draw three ligatures — one for short words, one for medium words, and one for long words. Fire up Illustrator or any other vector drawing tool that can save SVGs, and draw our soon-to-be ligatures:

Looking good? I think so. Let’s turn these into ligatures!

Turning SVG files into a font

We now have three SVG files and a reverse-engineered font waiting for ‘em. But even though both SVG and glyphs are vector images, they’re different formats, so we can’t just copy them over. We need to convert them to something OpenType can understand.

To do this, I find it easiest to create a new font from our SVGs, with specs matching our target font. If we’d pass that font through TTX, we can simply copy-and-paste the parts of the XML.

To create a font from SVGs, we can choose from a wide range of command line tools, web apps, or full-fledged font editors. I used the web based tool Icomoon.

The steps are straightforward: create a new empty font, import the SVG images, and export the font. All Icomoon’s settings can be left on default, except the units per em. If we keep those the same as Droid Sans’, we can simply copy our glyphs over without recalculating dimensions and offsets. On the download screen, click the preferences-button and head over to the font metrics. Under “Em square height” we fill in 2048, the number we noted from Droid Sans.

To make creating our own ligatures a lot easier, we already define one for each SVG. Click the “fi” button on the top of the screen to enable ligatures for our font, and you’ll see an input field appear for each character. Enter a word, like “ninja” for the smallest ligature, to create a GSUB table entry that we’ll use as blueprint for our word list.

We’re ready to download the font! Unzip it, and move the TTF over to your font hackin’ directory.

Transplanting our ligatures

We now have a font that contains only our three bullshit drawings — no letters, no punctuation marks, no icons, just our bullshit images. At this point, they’re mapped to characters in Unicode’s PUA (0xE600, 0xE601 and 0xE602). As you will see in a minute, these codes also make up their internal name.

Next step: moving our bullshit characters over to Droid Sans. We do this by converting bullshit.ttf to bullshit.ttx so it’s openable in our favorite text editor. This is the same thing as we did with Droid Sans:

$ ttx bullshit.ttf

We don’t need everything in this TTX file. All the data we’re interested in is contained in five tables:

GlyphOrder (entry GlyphID)
hmtx (entry mtx)
cmap (entry map)
glyf (entry TTGlyph)
GSUB

As we seen before, each table handles one specific feature or detail for each of the characters in a font. Let’s take a look:

Table one: GlyphOrder

This is a table generated by the TTX/Fonttools tool, and is not part of the OpenType spec. It simply lists all the characters inside the font. The glyph ordering is stored explicitly-but-implicitly in the glyf table, TTX/Fonttools just makes it easier for you to reorder them by giving you this GlyphOrder table.

We need to take our two entries in bullshit.ttx and stick ‘em to the end of the GlyphOrder table in droidsans.ttx. Don’t forget to change the ID values:

<GlyphID id="209" name="fraction"/>
<GlyphID id="210" name="foursuperior"/>
<GlyphID id="211" name="uniE600"/>
<GlyphID id="212" name="uniE601"/>
<GlyphID id="213" name="uniE602"/>

You’ll notice the name “uniE600”, based on the original glyph-to-character mapping. It’s just an internal name, so we can leave it like that.

Table two: hmtx, Horizontal metrics

This table tells the OS how much horizontal space to reserve for a character. You might expect that the width of a character represents how much horizontal space it occupies, but this is not always the case. In brush-style letters, for example, letters shapes could overlap each other slightly to create that connected handwritten effect. So the “horizontal advance” might be a little less than the letter’s width. That sort of info is stored in the hmtx table, and we need to append it with our three bullshit glyphs:

<mtx name="uniE600" width="12174" lsb="61"/>
<mtx name="uniE601" width="6458" lsb="19"/>
<mtx name="uniE602" width="4138" lsb="107"/>

Note that these widths are based on the grid size defined by the units per em value. If we would’ve used another value, we’d have to recalculate these widths here.

Table three: cmap, Character To Glyph Index Mapping Table

The cmap table defines the mapping of character codes to the glyph index values used in the font. It matches the Unicode value for the letter “P” to the font’s internal ID for the “P” glyph. There’s a lot more to this table, but for now let’s just add our three new characters. We need to do this for the two cmap_format_4 tables, so you can ignore the smaller cmap_format_0 table. Just add these to the end:

<map code="0x2044" name="fraction"/><!-- FRACTION SLASH -->
<map code="0xe600" name="uniE600"/><!-- ???? -->
<map code="0xe601" name="uniE601"/><!-- ???? -->
<map code="0xe602" name="uniE602"/><!-- ???? -->

Table four: glyf, Glyph Table

Alright, those previous three tables were just administration, and now we’ve reached the table where all the magic happens: the actual shapes. The glyf table holds TTGlyph entries, and each one of those contains a drawing of a character. These are noted as contour elements.

These entries are pretty large, but be sure to copy all three TTGlyph entries over.

<TTGlyph name="uniE600" xMin="79" yMin="-243" xMax="12032" yMax="1790">
  <contour>
    <pt x="8333" y="1122" on="1"/>
    ...
    <pt x="11847" y="946" on="1"/>
  </contour>
  <instructions><assembly>
    </assembly></instructions>
</TTGlyph>

(Pro tip: while poking around in Droid Sans, take a look at the Aacute entry of Droid Sans and see if you can spot how cleverly this letter is made!)

Table five: GSUB, Glyph Substitution table

If we’d convert Droid Sans back to a TTF at this point, we’d have our three bullshit characters in there, but they’re not ligatures yet. So the last step is to create the GSUB table where we point the words in our list to one of the bullshit-characters.

The GSUB table holds a simple mapping: which sequence of letters should be replaced by which glyph? Droid Sans already has a GSUB table, but it’s an empty placeholder. Since it has no function, we can replace it with the GSUB table from our ligature font. It’ll serve as a basic blueprint, to which we can add all our other words.

Depending on which words you used, the GSUB table of our ligature fonts looks something like this:

<LookupList>
  <!-- LookupCount=1 -->
  <Lookup index="0">
    <!-- LookupType=4 -->
    <LookupFlag value="0"/>
    <!-- SubTableCount=1 -->
    <LigatureSubst index="0">
      <LigatureSet glyph="n">
        <Ligature components="i,n,j,a" glyph="uniE602"/>
      </LigatureSet>
      <LigatureSet glyph="s">
        <Ligature components="y,n,e,r,g,y" glyph="uniE601"/>
      </LigatureSet>
      <LigatureSet glyph="p">
        <Ligature components="a,r,a,d,i,g,m,space,s,h,i,f,t" glyph="uniE600"/>
      </LigatureSet>
    </LigatureSubst>
  </Lookup>
</LookupList>

Every combination of characters that invokes a ligature is captured in this LigatureSubst list. Each first letter of a word has a LigatureSet entry, and all letters that follow are contained in a Ligature entry. So if we’d add another word starting with “s”, we get:

<LigatureSet glyph="s">
  <Ligature components="y,n,e,r,g,y" glyph="uniE601"/>
  <Ligature components="t,a,r,t,u,p" glyph="uniE601"/>
</LigatureSet>

I slapped together a little Python script that takes this list and outputs it as a valid set of GSUB entries.

At last, Sans Bullshit Sans is ready!

Well, that’s it: we’ve now added our custom ligatures to Droid Sans, and it’s time to taste the sweet fruits of our labor. We’re compiling our hacked Droid Sans back to a font:

$ ttx droidsans.ttx

And if we made no mistakes, it’ll compile droidsans.ttx and create a brand new droidsans.ttf file for us. Rename this to sansbullshitsans.ttf and its ready to be used as a webfont and on the systems of your company’s managers/social media gurus. Didn’t follow along with the hackin’? Just cheat and download Sans BullShit Sans here or check out the Github repo!

Check out the offcial Sans Bullshit Sans™ product page to see our baby in action. Have fun proactively creating paradigm shifts in your bespoke… ehrm, bullshit!