Internet standards body proposes new header field disclosing AI — will make it easier for machines to determine if AI was used on a site

(Image credit: Shutterstock)

The Internet Engineering Task Force (IETF), the body responsible for the standards used across the web, has just released a draft document that will introduce a new header to make it easier to determine whether AI was used on a web page. According to the AI Content Disclosure Header draft, this proposed metadata will make it easier for machines to determine how AI is involved in the production of a particular site for easier automation, indexing, and compliance.

“The goal of AI-Disclosure is to offer a low-overhead, easily parsable signal primarily for automated systems like web crawlers, archiving tools, or user agents that may need a quick indication of AI usage without processing complex manifests,” the draft document said. “This header is intended to be applied at the entire response level.”

There is no standardized, machine-readable way to determine if artificial intelligence was used in the creation of a site. While there are some existing ways to warn the consumer that content is AI-generated — either through written disclaimers on the page or watermarks on the video or image — these can’t be easily detected by machines and apps.

Information on the header will include mode, which tells whether AI was used on the page; model, which identifies what AI model was used to generate or modify the content; provider, which tells you what organization the model originated from; reviewed-by, which indicates who reviewed the content; and date, which applies the date and time the content was generated.

Mode has four values:

Swipe to scroll horizontally

Mode Value	Description
none	AI was not used to create or modify the content on the page
ai-modified	The source material was created by humans but was modified by AI. Examples of this include spell check, style suggestions, summary generation
ai-originated	The content was initially generated by AI, but was subsequently edited or changed by humans. This means that the content was manually checked for accuracy, although its originality might still be put into question.
machine-generated	This suggests that the content is mostly generated by AI with little to no human intervention.

The group noted that this header information is just to guide the device accessing the information on its contents, and that you still need to rely on “more comprehensive mechanisms such as C2PA” to determine specifically which parts of the page use AI.

Aside from this, the header isn’t a secure field, so its contents may be modified by third parties or intermediaries. So, it shouldn’t be relied on for making security decisions. Nevertheless, the AI Content Disclosure Header is a quick and easy way for devices to determine the provenance of a particular page, helping machines know at a glance if the page they’re looking at has been touched by a machine or not.

Note that this is still in the draft stage — it’s not yet a standard, and adoption is purely voluntary. Still, if the internet body adopts this, it’s one more way we can determine if the content we’re looking at was created by people or sourced purely from AI.

Jowi Morales is a tech enthusiast with years of experience working in the industry. He’s been writing with several tech publications since 2021, where he’s been interested in tech hardware and consumer electronics.

7 Comments Comment from the forums

SonoraTechnical

It's a good first step... an official acknowledgement that a distinction is needed.
Is it enforceable? Will standard website editing tools (e.g. Visual Studio) have a way of auto-generating this header? Will web servers (e.g. Apache, IIS) auto-generate this header upon page requests?
Reply
DS426

The Internet Engineering Task Force (IETF), the body responsible for the standards used across the web, has just released a draft document that will introduce a new header to make it easier to determine whether AI was used on a web page. According to the AI Content Disclosure Header draft, this proposed metadata will make it easier for machines to determine how AI is involved in the production of a particular site for easier automation, indexing, and compliance.
"Just" released? It has a published date of April 30th. Surely the IETF doesn't take that long to put up a new draft document on that website.
Reply
Alvar "Miles" Udell

It's generally pretty easy to tell when some content has been AI generated without even using Scribbr or another tool as it will typically be filled with copy reword paste, or straight copy paste, things from the source, not quote a source, and just generally have a feeling it was either written by AI or a first year journalism student.

Plus these days with Grammerly and other tools it would surprise me if most writers didn't use one of them to generate at least an outline or draft, especially ones that contribute to multiple outlets, doubly so if the source is a long presentation or other such source, so it really a good idea to label those pages as "AI"?
Reply
Kindaian

And where is the line drawn between those distinct levels?

IMHO it is irrelevant. It the scrappers are not able to distinguish between good content from lower quality content, then they have just what they deserve.

In the end of the day, the next step for this LLM AI thing is to properly curate the data, you know, like people do when making an encyclopedia or a dictionary?

Apart from that, only with "new" tech, is the existing LLM AI models going to evolve and become more useful.
Reply
edzieba

On the one hand, AI companies will want some sort of automated signals their scrapers can detect to avoid the slop-self-ingestion-loop degradation that is already well known for large language models. On the other, a trivially parseable signal also means end-users can easily just block AI slop pages on their end too, making the AI that generates the slop worthless since nobody will actually see it.
That puts them in a catch-22 situation, for public AI differentiator tags. The 'solution' (for the AI companies, not for consumers) is to do such signalling covertly, either by steganographic embedding or via a completely independent private side-channel.
Reply
EzzyB

SonoraTechnical said:
It's a good first step... an official acknowledgement that a distinction is needed.
Is it enforceable? Will standard website editing tools (e.g. Visual Studio) have a way of auto-generating this header? Will web servers (e.g. Apache, IIS) auto-generate this header upon page requests?
Yeah, this sounds like Do Not Track. Hey, you can set the flag, but it's up to the site to honor it.
Reply
passivecool

gpt and others, as so inclined, are honoring respective AI tags in robots.txt for crawling/scraping/indexing - whatever you want to call it.

i'm with @KindaianAI content generation is already be a continuum and not black-white.
Where does the spellchecker end and reformulation begin? When does the AI remind one of a fact or consideration left out, a logical or data inconsistency? is it still my own work if i ask someone or something to look over it? did anyone ever write a book themselves?

Studies have shown that content ingestion incest AI->LLM is generatively degenerative. LLM providers will therefore, in their own interest, find a way to watermark or otherwise identify content.
Or maybe, the entire knowledge of humans has already basically been ingested and it is now about cherry-picking what is new. Untin in 24-36 months where AI is creating the knowledge we could not yet create. directly. But our tools will create tools.
And yes, Machine Learning is the kind of the same: LMM input output loop with feedback I'm pretty sure they will get it sorted out.
When your personal ai agent - for which you pay and are therefore the customer and not the product - is assembling your daily feed, you won't worry about it anymore.
Reply

Show more comments