Website backup crippled by 1.6MB Friends GIF that was replicated 246,173 times, breaking Linux's EXT4 filesystem limit — Jennifer Aniston's 'happy dance' animation ate up 377 gigabytes of data due to security policy

Friends episode
(Image credit: Getty / NBC)

A single tiny GIF, frequently used in chats by a site's community members, ended up adding 377GB to the website's backup quota, breaking its Linux filesystem and causing the backup process to fail. The Jennifer Aniston ‘happy dance’ reaction GIF weighs in at 1.6MB, and in the headlining case, it was duplicated 246,173 times in the backup, writes Discourse tech blogger Jake Goldsborough. This problem was precipitated by, dare we say, an overuse of the happy dance GIF, plus a file security policy implementation. Fixing it wasn’t entirely straightforward.

Discourse is a company and open‑source software project that builds one of the most widely used modern community‑discussion platforms, currently powering over 22,000 online communities. Its real-time chat platform allows users to insert emojis and GIFs in their discussions to liven up debates. But the platform’s ‘secure uploads’ feature means that “when a file moves between security contexts (say, from a private message to a public post), the system creates a new copy with a randomized SHA1,” explains Goldsborough. “The original content is identical, but Discourse treats it as a new file.” So, a popular image or reaction GIF will spread across posts, reposts, and PMs, and each context creates another file copy.

Discourse’s first attempt at a fix for the system being swamped by duplicates was to track original content by its hash. Then, during backup, group uploads by the hash and download only the first file in each group. Hardlinks were created for any duplicates.

Article continues below

No one told them life was gonna be this way

This seemed like an elegant solution until one of Discourse’s larger customers made everyone aware of the ext4 limit of roughly 65,000 hardlinks per inode. In the headlining case, the backup worked with this first fix, but “instead of one download for all 246,173 duplicates, we got one download plus ~181,000 fallback downloads after hitting the limit,” explains the firm’s blog. “Not the win I expected.”

One of Discourse’s other customers had 432GB of uploads and a correspondingly hefty backup. However, analysis indicated that the unique content was just 26GB. In other words, duplicates were behind a 16x inflation factor.

The absurdly duplicated file that created 377GB of bloat was Rachel from Friends doing her happy dance. So, the problematic site was obviously quite a happy one, with the reaction GIF “used constantly in posts, PMs, everywhere,” noted Discourse.

Also happily, Discourse managed to fathom a fix for its earlier fix. In effect, this new fix begins like the old one, by creating hardlinks. But when the filesystem throws up an EMLINK error message (too many hardlinks), it will copy the file locally and treat the new file as ‘primary’ until it reaches the limit again. This new measure “works on any filesystem, no configuration needed,” says Discourse, with some satisfaction.

Discourse ends by highlighting the lessons learned from its confounding animated GIF duplication frenzy, wryly observing that “now I know Jennifer Aniston can stress-test infrastructure.”

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS
Mark Tyson
News Editor

Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

  • dtemple
    Guess no one told them life was gonna be this way.
    Reply
  • Exploding PSU
    Jennifer Aniston stress-testing infrastructure is a sentence I never thought I'd ever read my whole life, that's wondrous..
    Reply
  • S58_is_the_goat
    I guess Linux isn't coming to the desktop this year...
    Reply
  • Lieutenant Barclay
    Total chicks show

    ...is what I said before I watched all 10 seasons in about 6 weeks.
    Ross and Rachel end up together:LOL:
    Reply
  • PEnns
    S58_is_the_goat said:
    I guess Linux isn't coming to the desktop this year...
    And apparently not even in the near future. That's embarrassing!
    Reply
  • remixedcat
    Exploding PSU said:
    Jennifer Aniston stress-testing infrastructure is a sentence I never thought I'd ever read my whole life, that's wondrous..
    janet jackson has a CVE
    Reply
  • derekullo
    They should have been using ZFS ... inline compression and deduplication !!!

    In LZ4 we trust!
    Reply
  • bit_user
    I wonder if XFS would've allowed more hard links. However, I think 64k hard links per inode is a reasonable limit.

    In any case, I do think the forum software developers should've seen this coming and implemented their fix from the outset.

    On the plus side, the forum load times probably improved after the fix.
    Reply
  • Grobe
    S58_is_the_goat said:
    I guess Linux isn't coming to the desktop this year...
    Likely true, but in what way is this related to this case where a company making some poor choices of file system and backup solutions - and nor has make any solutions for repeating files ?
    Reply
  • CParsons
    dtemple said:
    Guess no one told them life was gonna be this way.
    Your GIF's a joke, your servers broke, your backups D.O.A.
    Reply