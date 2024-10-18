AI researcher and data journalist Simon Willison used the Google AI Studio tool to convert a 35-second screen recording of 12 emails into a single spreadsheet. This experiment surprised Willison, who did not expect the AI to return accurate results at such a low cost. According to his blog, AI Studio charged him 11,018 tokens for this action, and with a cost of 7.5 cents per million token, this exercise amounts to less than 10% of 1 cent.

Willison's scenario saw the need to source numerical values across 12 different emails. Rather than spend time copy and pasting the source data into a spreadsheet, they enlisted the help of AI to review a screen capture of their emails, and to pluck the data from the video. The prompt that Willison provided to Google's AI Studio being a simple "Turn this into a JSON array where each item has a yyyy-mm-dd date and a floating point dollar amount for that date"

Willison provided an example of the JSON formatted output.

[ { "date": "2023-01-01", "amount": 2... }, ... ]

Willison reveals that the end cost was 1/10th of a cent. This is calculated by AI Studio using 11,018 tokens, of which 10,326 were for video. The Gemini 1.5 Flash 002 model, a cheaper model than the Gemini 1.5 Pro, charges $0.075 per one million tokens. Willison helpfully shows us the math that lead to this conclusion.

11018/1000000 = 0.011018

0.011018 * $0.075 = $0.00082635

But for the time being, Google AI Studio is currently free of charge, so Willison didn't spend a cent!

While scraping data from a few messages in your inbox might seem like an easy task that doesn’t require any sort of automated assistance, this is going to be a different story if you have to find data from a hundred or even a thousand emails. There are other alternatives to screen recording and feeding the data to AI, like using an API to scrape your inbox or using Google’s own Gemini in Gmail tool. However, the former requires some programming knowledge which most users likely aren’t familiar with, while the latter has its own issues that might make you nervous about granting Gemini complete access to your inbox.

What makes video scraping such a powerful tool is that it doesn’t take much effort for anyone to use it — all you need is a way to capture your screen and a multi-modal tool (like Gemini 1.5) and it can produce a database from the information you’ve recorded on your screen. Aside from not requiring any specialized knowledge, you could scrape data from potentially any source, including web pages.

This is actually the same concept of the controversial Recall tool that Microsoft introduced with its Copilot+ PCs and the third-party Rewind AI tool available for macOS. However, even if these tools only process your data locally on compatible devices, they still have an inherent privacy issue because they record your screen all the time you use your computer and store them in a local folder. Even if the screenshots aren’t uploaded to the cloud, the fact that they’re saved in one place on your computer makes your data vulnerable.

Willison's process is intriguing and will surely spark others to investigate how AI can be used to perform other such tasks.