New Deepseek model drastically reduces resource usage by converting text and documents into images — 'vision-text compression' uses up to 20 times fewer tokens

Deepseek logo on an iPhone
(Image credit: Getty Images / NurPhoto)

Chinese developers of Deepseek AI have released a new model that leverages its multi-modal capabilities to improve the efficiency of its handling of complex documents and large blocks of text, by converting them into images first, as per SCMP. Vision encoders were able to take large quantities of text and convert them into images, which, when accessed later, required between seven and 20 times fewer tokens, while maintaining an impressive level of accuracy.

Deepseek is the Chinese-developed AI that shocked the world in early 2025, showcasing capabilities similar to those of OpenAI's ChatGPT, or Google's Gemini, despite requiring far less money and data to develop. The creators have continued to work on making the AI more efficient since, and with the latest release known as DeepSeek-OCR (optical character recognition), the AI can deliver an impressive understanding of large quantities of textual data without the usual token overhead.

“Through DeepSeek-OCR, we demonstrated that vision-text compression can achieve significant token reduction – seven to 20 times – for different historical context stages, offering a promising direction” to handle long-context calculations, the developer said.

Deepseek tokenization pipeline.

(Image credit: Deepseek/AI Engineering/Medium)

This works really well for handling tabulated data, graphs, and other visual representations of information. This could be of particular use in finance, science, or medicine, the developers suggest.

In benchmarking, the developers claim that when reducing the number of tokens by less than a factor of 10, DeepSeek-OCR can maintain a 97% accuracy rating in decoding the information. If the compression ratio is increased to 20 times, the accuracy falls to 60%. That's less desirable and shows there are diminishing returns on this technology, but if a near-100% accuracy rate could be achieved with even a 1-2x compression rate, that could still make a huge difference in the cost of running many of the latest AI models.

It's also being pitched as a way of developing training data for future models, although introducing errors at that point, even in the form of a few percent off base, seems like a bad idea.

If you want to play around with the model yourself, it's available via online developer platforms Hugging Face and GitHub.

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

TOPICS
Jon Martindale
Freelance Writer

Jon Martindale is a contributing writer for Tom's Hardware. For the past 20 years, he's been writing about PC components, emerging technologies, and the latest software advances. His deep and broad journalistic experience gives him unique insights into the most exciting technology trends of today and tomorrow.

  • anoldnewb
    "This works really well for handling tabulated data"

    It seems very counterintuitive that this process:
    text -> image -> via AI -> organized text document ( with tables and proper flow of text) -> via AI to ->Ai tokens
    is better than
    text - via AI to AI tokens


    However, after trying and failing many times to get a table in a pdf document copied into a spreadsheet or as a table in a document editor, I understand how it could be better.

    Basically, a PDF file is often an unorganized file that contains a jumble of text. Try highlighting just what you want to copy in some pdf files and you can get crazy bits of text from here and there instead of what seems obvious on the screen.

    Deepseek was forced to operate with fewer hardware resources and has developed several innovative AI advancements that has enabled them overcome some of the hardware restrictions that the US has imposed on them. They have leveraged all the tech that they could "borrow" and have found novel paths forward. Maybe it is time to consider "borrowing" some of their developments.
    Reply