Booking.com customers learn the hard way that Unicode is tricky

(Image credit: Shutterstock)

It's easy to mistake an "l" for a "1" or an "I" with a poorly designed typeface. (Ahem.) Fortunately, modern fonts tend to use a variety of techniques to disambiguate those easily confused alphanumeric characters. But those designs rarely account for ambiguity that results from similarities across different character sets, as a recent phishing campaign targeting Booking.com users demonstrates.

BleepingComputer reported that "the attack, first spotted by security researcher JAMESWT, abuses the Japanese hiragana character 'ん' (Unicode U+3093), which closely resembles the Latin letter sequence '/n' or '/~', at a quick glance in some fonts." The attacker's hope is that people will gloss over the funky character, follow the malicious link, and then fall prey to the malware they're distributing via this campaign.

Unicode has been exploited like this many times before—this is a relatively common way for spammers to make it past email filters, for example, or for particularly dedicated trolls to harass people online despite the prevalence of profanity filters. Yet it remains a difficult problem to solve because text rendering, much like DNS, is more cursed than most people realize. So let's go through a crash course on characters.

Computers originally supported the minimal American Standard Code for Information Interchange—or, as sane people call it, ASCII—standard. That was relatively simple: it allowed computers to deal with the 26 letters of the English alphabet in both their lowercase and uppercase forms, a smattering of critical punctuation, and various control codes that told the computer when to draw a new line, indent text, etc.

But it turns out that even the British empire couldn't make the English alphabet the only character set on the planet, and some of the people who use those characters wanted to use computers, too. That led to the creation of the Unicode standard, which is used to encode characters on every modern device. (Let's not get into the actual encoding being UTF-8 on all sensible systems, or, more specifically, non-Windows systems.)

The Unicode Consortium says that Unicode "can encode up to roughly 1.1 million characters, allowing it to support all of the world’s languages and scripts in a single, universal standard" and that "all modern operating systems, computing environments, programming languages, and applications support the core of the Unicode Standard." So we can have cool things like emojis, punctuation, and non-English letters.

We can also have attacks like the one targeting Booking.com users, though, and preventing them is non-trivial. An operating system, browser, etc., knows how to handle Unicode characters, but that doesn't mean it can determine when a character is being used deceptively. Sometimes people want to use mixed character sets to communicate effectively; sometimes they just want to make something look cool.

Just to drive home the point about this not being an easy problem to solve: Unicode makes it difficult to achieve seemingly basic things like count the number of characters in a given text snippet, for example, or determine whether two characters are visually aligned. That isn't to say that addressing this problem is impossible, but I suspect it's a lot more complicated than most people would expect.

As for what people can do to avoid falling victim to schemes like this one targeting Booking.com users, my official recommendation is to never read your email or click links. Unless they're to even more thorough explanations of why text rendering (and editing!) is cursed. Then, by all means, click away. Nothing wrong with a little cursed knowledge, or at least that's what I tell myself when I try to go to sleep at night.

TOPICS

Nathaniel Mott is a freelance news and features writer for Tom's Hardware US, covering breaking news, security, and the silliest aspects of the tech industry.

17 Comments Comment from the forums

Alvar "Miles" Udell

A start would be to use "AI" in browsers to flag links that use random non regional characters, as exampled in the article, that throw up a prompt that requires multiple clicks to pass (similar to the "are you sure you want to proceed to this site" prompt for potentially insecure connections) while highlighting and explaining the reason for the prompt. This could be done at the local level by Copilot, Gemini, or whatever the device's "AI" is, on device.
Reply
USAFRet

Alvar Miles Udell said:
A start would be to use "AI" in browsers to...
Oh my...How did we ever survive before "AI"?
Reply
Alvar "Miles" Udell

USAFRet said:
Oh my...How did we ever survive before "AI"?

According to AI (copilot)

In some ways, it’s wild to think about — but before AI, survival was just…slower, more manual, and way more dependent on human memory, collaboration, and sheer grit.

But really we survived by paying attention and using common sense, though that was also before social media brought us things like the Tide Pod Challenge...
Reply
USAFRet

Alvar Miles Udell said:
According to AI (copilot)

But really we survived by paying attention and using common sense, though that was also before social media brought us things like the Tide Pod Challenge...
haha...

You asked an AI model "How did we survive before AI?"

And are trusting the answer it gives you?

This may be the funniest thing I've heard all week. And there were some major zingers along the way.
Reply
logainofhades

Whatever AI google search is using is quite dumb. I trying to find differences in a couple different fishing rods, and I got an answer comprised of both fishing rods, and engine connecting rods. :rofl:
Reply
Notton

USAFRet said:
Oh my...How did we ever survive before "AI"?
IDK what you're talking about. I've had to fix so many computers back in the day because they were infected with a virus.
These days it's all about phishing because everything moved to an online only service and hijacking cookies/tokens is the faster way to steal money.
AFAIK, phishing wasn't as prevalent even just 10yrs ago, and the ones that were got quickly picked up by news as something to watch out for. These days the news quality has gone down the drain and the cycle is so quick that it seems to have broken advocacy.
I don't have anything to backup my claims, so it's all IMO.
Reply
USAFRet

Notton said:
IDK what you're talking about. I've had to fix so many computers back in the day because they were infected with a virus.
These days it's all about phishing because everything moved to an online only service and hijacking cookies/tokens is the faster way to steal money.
AFAIK, phishing wasn't as prevalent even just 10yrs ago, and the ones that were got quickly picked up by news as something to watch out for. These days the news quality has gone down the drain and the cycle is so quick that it seems to have broken advocacy.
I don't have anything to backup my claims, so it's all IMO.
And my comment was along the lines of "AI is supposed to fix this? yeah, right..."
Reply
DS426

Don't go bashing DNS. That's for another day.
Reply
Alvar "Miles" Udell

USAFRet said:
And my comment was along the lines of "AI is supposed to fix this? yeah, right..."

"AI" won't fix it, but it will help. I use quotation marks because in this case an algorithm (and any algorithm these days is called machine learning or AI it seems), while it may be no more advanced than a program's spell check these days to detect out of place unicode characters and scam redirects like Microsoft.com.zip, it would give that extra line of defense to people who didn't notice it before, and be a slight extension to the "Smart Screen" like functionality that already exists to help catch mistyped addresses.

Or do you have a better idea of something that's essentially a quick fix to a problem with a complex real solution?
Reply
USAFRet

Alvar Miles Udell said:
Or do you have a better idea of something that's essentially a quick fix to a problem with a complex real solution?
No, I don't.

There is no 'quick fix', and AI certainly isn't a quick fix.
Reply

Show more comments