I rendered 1,418 confusables over 230 fonts. Most aren't confusable to the eye

<- Back

I rendered 1,418 confusables over 230 fonts. Most aren't confusable to the eye

paultendo

Comments (39)

jonhohle
About 20 years ago I used Cyrillic confusables to watermark internal documentation that was being leaked by a disgruntled customer service employee. The document would dynamically render and include the employee ID based encoded as bits in the text. It survived copy/paste to plain text well.I did run into some issues in early versions on when characters in Linux commands or visible web addresses were replaced. Fortunately the source docs were HTML, and it was easy to exclude code or pre nodes when rendering.I thought this was so clever, but the leaker was never caught using it, to the best of my knowledge.
albert_e
> 82 pairs are pixel-identical> a string like “аpple.com” with Cyrillic а (U+0430) is pixel-identical to “apple.com” in 40+ fonts. The user, the browser’s address bar, and any visual review process all see the same pixels. This is not theoretical. It is a measured property of the font files shipping on every Mac.Current implementations of "Computer Use" Agentic AI tools mostly use visuals -- screenshotting of a computer screen and interpreting it.These pixel-dentical character pairs will be a straight failure mode for those automations and could possibly be a threat vector if crafted well.
apothegm
Maybe not at super large font sizes. But even lowercase i and l are easy enough to confuse at a glance mid-word in most sans-serif fonts, not to mention uppercase I and lowercase l. You don’t even need “confusable” glyphs to create a domain name that will stand up to a casual visual confirmation from a busy user in a phishing context.
ordu
But what about 'Ы'? It looks like 'bl', doen't it? 'Ы' is one codepoint and one glyph, though 'bl' is a sequence of two letters. I believe that the method described will miss such things. Cyrillic also has 'Ю', I suppose it is possible to design a font that make it look like 'lO'? Are there any fonts like this in a wild?
vivid242
Thanks for the effort!I'm always intrigued by the German FE-Schrift ("fälschungserschwerende Schrift", "more-difficult-to-forge font") chooses shapes for characters that makes it hard for them to be turned into one another (like a 3 into an 8 or so):https://en.wikipedia.org/wiki/FE-Schrift
jeroenhd
An interesting attempt, Claude. However, your promot is missing an important step to measure effectiveness against humans: wait 40-60 years for your vision to degrade naturally, and check the confusables again, preferably on a small phone screen. Bonus points if you can find someone with visual disabilities from birth. Obviously most attacks aren't pixel-perfect, but that's not the point, all you need to confuse are human eyes.Things like the Fraktur characters are obvious mismatches in any font I know, I do do wonder why they're on the list.
nnevatie
Hmm, is SSIM a good metric for comparing fonts? I'd imagine it isn't ideal, as fonts are mostly textureless and SSIM has no concept of glyph identity or typographic intent.
Grom_PE
0 and O, and l and I that look the same in a single font is a crime of modern typography.Also, I remember 8x16 VGA font that came with KeyRus had some slight differences between Cyrillic and Latin lookalikes, that brought some strange sense of comfort when reading, and especially typing the letter c, because its Cyrillic lookalike is located on the same key.
serial_dev
Was it a demo site? The font looks very wonky, not sure if I should copy-paste from it.
rustyhancock
Ooph, I couldn't get far in this the font is giving me motion sickness some how.Was that the intention?
chii
> A domain using only Cyrillic characters that happen to spell a Latin word (like “аpple” in all-Cyrillic) may still render in the address bar’s font and look identical.that is very interesting.I imagine the browser could take some context clues and switch rendering to puny code if the locale of the user is nowhere near a cyrillic region. But that is only going to patch some edge cases and miss others.Ideally, the solution is password managers everywhere, which don't have this vulnerability, instead of using human eyes to visually recognize web urls and thus is vulnerable.
recursivecaveat
This seems misguided. The fact that 'ρ' isn't a pixel for pixel match for 'p' doesn't mean they're not confusable. The threat model is not being unable to solve a spot-the-difference puzzle. Unless you are familiar with every pixel of your system fonts, and carefully scrutinize every character on your screen, the lack of an exact match in jρmorgan[.]com in a URL is going to do very little for you. There are many english characters that have multiple totally distinct ways to write them, so you can have two 'a' variants that are distinct but equally 'normal' looking. I guess if you get an LLM to write your blog posts they don't have to make much sense to begin with.
Oarch
This is really cool. I loved the technical breakdown and side by side comparisons. Surprised to hear that Microsoft and MacOS default fonts didn't score so well!
anon
undefined
Cool_Caribou
Why are all the descending letters truncated in the titles? Not sure if it's a css glitch or terrible font choice. A bit ironic on an article about fonts.
doctorpangloss
well, you didn't really do anything, did you? Claude Code rendered these things and wrote the blog post haha> "This is not theoretical. It is a measured property of the font files shipping on every Mac."some patterns of speech are so recognizably LLM, i am convinced that the AI detection startups have a very strong chance to succeed on text.
arlattimore
This is very cool, impressive piece of work Paul.
polliog
[dead]
anon
undefined