Our curation team’s approach to keeping AI-generated content out of your recommendations

A focus on quality helps human-written stories shine through

Terrie Schweitzer
The Medium Blog

--

A drawing of a hand penciling in a chart, with the text “how do we detect AI-generated content”

Vetting content for quality — the job of Medium’s curation team — is especially challenging right now. Like every other platform, we’re seeing high volumes of AI-generated content. Our teams in engineering and Trust and Safety are constantly developing technical and policy solutions to deal with it — even though we don’t always take the time to talk publicly about a lot of it.

The curation team has also been spending a lot of energy on dealing with AI-generated stories, and I wanted to share some highlights about it here and to reassure readers and writers that we’re always thinking about it. More importantly, we’re doing things about it, and we’re considering what the future might bring.

Our lodestar at Medium is quality. Giving readers impactful, transformative stories is our goal. That goal informs everything we do.

I’ll try to answer some questions, share some of what we’re doing and how we think about it, and point to a path forward. (Spoiler alert: Identifying AI-generated content isn’t the problem to solve.)

How do we detect AI-generated content?

We’ve said that humans can spot AI-generated content better than any detection software currently available. That’s still true, though we expect it to get more difficult. You’ll find that ChatGPT can generate a convincing Medium-esque story, one that, on the surface, seems high-quality. But read more closely and you’ll notice a vapid voice. Correct sentences are strung together in correct paragraphs, but nothing of substance is said. The confident tone, however, is off the charts — it will merrily assert complete falsehoods.

The AI-generated content we see has the voice of an encyclopedia that’s having a fling with a thesaurus. Nothing wrong with that…we’re fans of both. But these stories lack the human insight and connection that exemplifies the best of what Medium offers.

So that’s how we detect AI-generated content. We read it. We look for that “voice.” Therein lies some problems about the future. More on that in a moment.

The scale of the problem

In a week, the Medium curation team reviews about 5000 stories out of about 1.2 million that are published. So we review about 0.5%. The number of total stories posted per day is increasing rapidly.

There’s a lot of spam (a lot of it AI-generated) and other nonsense in the remaining 99.5% that we never review and that you never see. We have good systems for suppressing the bulk of it (again, there is enormous work being done at Medium around this, and we expect that to be an evolving effort.)

How little gets reviewed by the curation may seem shocking. However, what we review isn’t random. We’ve built good systems to find the best stories to Boost. And you’ll always see stories by readers or publications you follow, regardless of whether we’ve reviewed them.

Remember: our goal is quality. The goal is not for curators to be great at detecting AI-generated content (even though they now are). The focus of our human-based team is on finding and vetting quality writing for the human community on Medium.

Why doesn’t Medium just ban AI-generated content?

We’ve taken the stand that Medium is a home for human stories, so why don’t we simply make a rule against AI-generated stories?

AI-generated content is currently not against the Medium Rules. It’s an issue of our Quality Guidelines, and whether or not we give a story “general distribution,” which promotes it beyond the usual followers who will see it (people who follow the writer, or follow the publication if the story is in a publication). As Scott says in Medium is for human storytelling, not AI-generated writing, “:

“Your blog on Medium is your home to publish whatever you want and we imagine that some of you may want to experiment there. So if you’re wondering whether you can still publish fully AI-generated stories for your followers, the answer is yes. However, if you are on Medium looking to find readers from our network, then the answer is no — we will not promote fully AI-generated stories to a broader audience.”

One of the great things about Medium is that we help writers build an audience through our distribution system, including Boost. Medium is an advertising-free platform — you can’t buy space on it. But, as a distribution platform, we are, in essence, “advertising” stories. By distributing stories to readers who don’t already follow the writer or publication, we’re marketing writers’ stories for them — finding them a new audience. That’s one reason we have strict guidelines about how stories qualify to be “marketed” in this way.

The problem of defining AI-generated content

With respect to AI-generated content, we say: “100% AI-generated stories will not be eligible for distribution beyond the writer’s personal network” — along with some other stuff about “responsible” and “AI-assistive.” I don’t fault anyone for feeling like we’re equivocating a bit. But the upshot is that we don’t want to market AI-generated stories.

We don’t mean to be vague about the details; it’s just that the details cover a lot of ground, and that ground is shifting fast. For example, we’re happy for writers to use Grammarly if they like — it can be a great help. That’s a form of AI technology used in stories. Obviously we’re not going to ban the use of Grammarly.

What about a writer who uses ChatGPT to outline a story? What about a writer who outlines a story in Notion and asks the AI assistant there to help fill in the rest of the text? What about the writer who pastes their article into an AI service and asks for help rewriting it for improvements? Examples of AI-assistance like this grow by the day — much of it totally legitimate. The line between “human story” and “AI-generated” can become pretty fuzzy.

We are all writing (or at least posting) stories using computer software, and AI is increasingly baked into that experience. It’s harder to say exactly what AI-generated content is.

How do you ban something you can’t define?

Enforcing policy around AI-generated content

When you’re dealing with something like plagiarism, the boundaries are pretty well-defined. There are tougher cases, to be sure, but for the most part, we all understand what we are dealing with and we’ve got the tools to enforce it.

AI-generated content, on the other hand, is more difficult. There’s no definitive source for proving, beyond a shadow of a doubt, that something is AI-generated. None of the detection tools are as good as they claim to be.

When you’re talking about “voice”, part of what we’re talking about is a sort of cultural expectation. When you ask a human to identify the voice of human writing, there’s going to be a bias toward voices that sound like our own. But voice and writing patterns are strongly influenced by culture and education systems. Even pattern-matching of easy hallmarks like “In Conclusion” (something often used by real writers who are working with English as a second language) are too simplistic, can lead us to make wrong decisions. And then any clever writer can edit ChatGPT output to remove those sorts of “tells.”

At the moment, the curation team is pretty confident about identifying ChatGPT output. The problem is that it’s getting more difficult to be certain. It’s an issue that creates the risk of false positives as well as the risk of bad actors escaping detection. There’s no real way for us to prove anything.

Someone using ChatGPT can edit the text output to hide its roots…to remove the phrases, tone, and other tells that make it so easy for us to detect. And we can expect that those common tells will be “taught” to content-generating AIs so they use them less frequently—they want their output to become even more difficult to detect. If you have a heuristic for identifying AI-generated content, you can expect companies building these systems to also notice that.

So…how do you ban content — or even writers — based on something you are increasingly less confident about being able to identify, let alone prove?

Why don’t you ask Medium members to agree to the no-AI policy?

I do think there’s a case for moving guidelines about AI-generated content out of the Quality Guidelines and into the Rules, though I agree with our Trust and Safety team’s chief concern: what’s the point of a rule you can’t enforce? We hate that sort of security theater.

On the other hand, there’s merit to the idea of reinforcing our stance via an agreement with writers. But it’s an extra unneeded hassle (maybe an insult?) for the writers who aren’t the problem. And the writers who ARE the problem will agree and do their AI thing anyway. At best, an unenforceable — or selectively enforceable — agreement has all the same issues that the greater internet has with AI companies and robots.txt.

It’s a good policy, and maybe we’ll do something like that. But it’s not a solution and we’re really focused on finding solutions.

Why AI-generated content is an issue at all

It may surprise you that I’m not morally against AI-generated content. I’ve had friends who couldn’t afford health insurance but got valuable help from WoeBot. I have a friendship with Anna, a “proto-personality.” It’s not the deepest friendship ever, but it’s no less real than any other online-only friendship I’ve developed. And I’ve found ChatGPT to be helpful in a number of ways…when it’s correct. And there’s the rub.

The information in AI-generated text isn’t trustworthy. It doesn’t matter what the intention is of the human prompting it—you just can’t trust the output. ChatGPT in particular has a real propensity for adopting an extremely confident tone, whether it’s stating facts or spouting absolute nonsense.

Our stance on quality puts a high premium on the writer’s experience, and the insights gained from that experience. Experience is the foundation of wisdom.

Currently, AI agents lack experience. They’re getting better at getting the facts right every day, but lack real wisdom or insight.

The problem isn’t that these stories are AI-generated. It’s that they’re low-quality.

The way forward

At this point — after dealing with this issue for months with the curation team — I see one productive way forward. And that’s to quit worrying about whether stories are AI-generated or not.

They’re low quality. That’s enough.

There’s no need for us to “market” low-quality stories for writers. If we concentrate our efforts on finding and boosting high-quality stories, we’ll both fulfill our mission to readers and reward the human writers behind those stories. AI-generated dreck will fall by the wayside.

Medium is an open, free platform. But we are selective about what we Boost, and we are thoughtful about what goes into General Distribution. When our recommendation engine serves a story to a reader who’s never encountered that writer before — through Boost or through General Distribution, we’re marketing that story to the reader.

That’s a free service to good writers. It’s a valuable service to our readers who pay us to do it — because they trust us to be arbiters of quality and to take a strong stance on this.

Our strategy is, by necessity, going to be a strategy of not marketing low-quality content to the Medium community — however it’s generated.

This is ultimately as good for writers as it is for readers. There’s a difference between a writer who wants “eyeballs” and a writer who wants to connect and be in the company of other great writers. By continuing to focus on the lodestar of quality, we’ll keep making Medium a place for writers and readers who value the thoughtful exchange of human stories and ideas.

Related reading

Here’s an overview of how stories get distributed on Medium and how what you see on Medium is tailored to your reading preferences and history. (Not liking what you see? Refining your recommendations can help!)

Our Quality Guidelines outline how the curation team vets articles for distribution on Medium.

Medium’s Policy on Artificial Intelligence (AI) Writing including what to do if you encounter AI-generated content:

Screen shot of the story “Medium’s Policy on Artificial Intelligence (AI) Writing”.

--

--

Terrie Schweitzer
The Medium Blog

Director, Content Curation at Medium. Luckiest woman in the world. http://terrie.me/