AI-Generated Pizza Commercial Impresses, But Visits Uncanny Valley

Pepperoni Hug Spot Commercial
(Image credit: Tom's Hardware)

In the last few months, we've seen how large language models such as ChatGPT can generate text copy, how image generators like Stable Diffusion can create pictures on demand and even how some can do text-to-speech. One enterprising developer who goes by the handle Pizza Later, combined five different AI models to create a live action commercial for a fictional pizza restaurant called "Pepperoni Hug Spot."

The resulting video, which I've embedded below, is both horrifying and impressive at the same time. The commercial features photo-realistic people who are eating, cooking and delivering some very appetizing pepperoni pizza. It even has human-sounding dialog and decent background music.  However, the facial expressions and dead eyes on some of the characters are a little much.

Obviously, the quality of the output leaves something to be desired. At times, objects appear to blend into each other; my son said that it looked like the people were eating pizza that grew out of the plate. 

Pepperoni Hug Spot Commercial

(Image credit: Tom's Hardware)

The people all look like residents of the uncanny valley. And the somewhat incoherent script reads like text from another language that was improperly translated into English (though it was not).

Pepperoni Hug Spot Commercial

(Image credit: Tom's Hardware)

However, it's impressive to see just how close these technologies are to being ready for prime time. We can see how, in short order, the photo-realistic video images could become a lot more convincing.

To be fair, this video did require some human editing. Pizza Later told us that they used five different models to make various assets for the video and then spent some time using Adobe After Effects to stitch the video, dialog, music and some custom images together. Overall, it took them 3 hours to complete the project.

Pizza Later said they got the idea for the commercial after gaining access to Runway Gen-2, a text-to-video model that's in private beta. In an email interview, the developer told me that their initial prompt for the video was just "a happy man/woman/family eating a slice of pizza in a restaurant, tv commercial." Runway Gen-1, which creates videos based on existing footage, is available to try free right now either on the web or via a brand new iOS app.

After seeing the high quality of video that Runway Gen-2 created, Pizza Later used GPT-4 (the engine behind ChatGPT and Bing Chat) to come up with a name for the fictional pizza joint (Pepperoni Hug Spot) and to write the script. The developer then used ElevenLabs Prime Voice AI to provide realistic narration with a male voice. They used MidJourney to generate some images that appear in the video, including the restaurant exterior and some pizza patterns. They also used Soundraw to create background music.

Most of the tools Pizza Later used are paid, but offer some kind of free trial, lower-end free account or initial set of free credits. Clearly, this is far from a plug and play operation as the developer had to stitch the end results together. 

Perhaps, in the near future, a multi-model tool like Microsoft Jarvis will be able to perform all these tasks via a single chat prompt. Or maybe an autonomous agent such as AutoGPT (see how to use AutoGPT) will generate commercials if you give it the broad goal of marketing a restaurant. However, for now, this video is really impressive, even after knowing that it required human editing.

Avram Piltch
Avram Piltch is Tom's Hardware's editor-in-chief. When he's not playing with the latest gadgets at work or putting on VR helmets at trade shows, you'll find him rooting his phone, taking apart his PC or coding plugins. With his technical knowledge and passion for testing, Avram developed many real-world benchmarks, including our laptop battery test.
  • USAFRet
    "Obviously, the quality of the output leaves something to be desired."

    In the running for understatement of the decade?
    (and we're only a couple of years into it)


    Although we've all seen worse from major companies and human created ads.
    Reply
  • UWguy
    Horror film
    Reply
  • I don’t know it’s about as good as anything I’ve seen a human do and I like the caveman talk lol

    I detest advertising by the way in all forms because it’s nothing but a huge lie. Your tummy won’t be happy, you won’t be happy. The only difference is you won’t be hungry. It’s all a lie to sell you a product.
    Reply
  • TJ Hooker
    I saw this on reddit, one of the comments was "looks like something you'd see on Adult Swim at 1:00 AM".
    Reply
  • bigdragon
    No writers. No artists. No actors. No extras. No stock videos or images. No narrator. No production company. No studio space. Just some person prompting an AI and gluing together the results. Give this a couple years of development time and it'll be production-ready.

    I am so glad I don't have children destined to live in the basement forever thanks to AI. Plenty of available space for a future holodeck with AI-generated environments!
    Reply
  • ThatMouse
    How is any of the video actually generated? Looks like random clips.
    Reply
  • atomicWAR
    I feel like I crossed into the twilight zone watching that....
    Reply
  • OriginFree
    TJ Hooker said:
    I saw this on reddit, one of the comments was "looks like something you'd see on Adult Swim at 1:00 AM".
    "Tim and Eric Awesome Commercial, Great Job!"
    Reply
  • lorfa
    TJ Hooker said:
    I saw this on reddit, one of the comments was "looks like something you'd see on Adult Swim at 1:00 AM".
    I definitely got Lynk's disease from watching it.
    Reply
  • _dawn_chorus_
    Needs more Balenciaga.
    Reply