Can You Trust AI Content Detection Tools? Originality.ai Case Study

Recently, I was playing around with a few AI content detection tools. No surprises here but it turns out that they all work differently. With the explosion of interest in using AI written content, what really got my attention is whether human-written content could incorrectly be marked as AI written content?

The results might surprise (and shock) you.

And how often does this happen? I suppose it ultimately depends on which detection tool you're using but I wanted to dive in deeper to see if this is a potentially common occurrence with widespread implications.

So I used the well-known AI content detection tool Originality.AI to confirm my suspicions.

For the study, I used 100 articles, each of human written text, and AI generated. I assumed a real life scenario where a content manager may be outsourcing their blog posts. And reviewing them with AI detection tools to verify the authenticity of human writers.

I used three different scores to widen the scope of the study. The results could have far reaching consequences for the SEO and content community, especially in the face of such rapid adoption of AI tools.

How do AI content detection tools work?

Without using technical jargon, the simplest way to understand how artificial intelligence content detection tools work is that they look for patterns in content. Complicated sentences, perfect grammar and predictably common word combinations can flag content as written by AI.

What is a perfect original score for your content?

I've seen comments in the SEO community stating that to be safe, a piece of content must be rated above 80-90 as human written.

To put this to the test, I made sure to pick a credible AI detection tool instead of any random anti-AI detector. We used Originality.ai because it is popular and frequently recommended as an AI content detector tool amongst SEOs and content writers.

I also tested my samples against three different originality scores.

Data analysis on classifying human text as AI text

I've heard some folks swear by a human written score of 80 while others say 90. I say, follow the data.

Our sample sizes consisted of 100 AI-generated and 100 human-written blog articles. We tested three scenarios for our study to mimic real life situations.

In each scenario, we targeted a different human originality score to test the content for writer authenticity.

Scenario 1: Human originality score of 50

In the first scenario, we tested our 200 samples for a human score of 50%. Any blog post that scored under 50 would be deemed to have been written by AI and would be discarded.

A 50% originality score would mean it was written by a human, and so pass the test.

You're probably thinking, 50%?!

Really? 50%?

That seems like a very low barrier to aim for. And you're right, I agree.

Take a look at the chart below showing how our samples were categorized when we aimed for a 50% human score. The top left quadrant shows that 78% of AI generated articles from our sample actually have a human score of at least 50%.

Not to mention in the bottom left quadrant, 10% of our samples which are actually written by humans, were classified as written by artificial intelligence tools. (didn't pass the 50% requirement)

That's 1 out of 10 articles that are incorrectly deemed to be AI written content at the cost of human effort. One out of ten. Quite a lot, isn't it?

But perhaps a human score of 50% is too low. After all, you're probably gunning for a human written score of 80-90% if you're outsourcing or editing AI generated content.

Scenario 2: Human originality score of 80

Let's up the stakes by targeting a minimum human written score of 80%.

Articles that score under 80 will be considered "cheating."

Take a look at the chart below.

The AI detection accuracy went up, but the bottom half is getting into areas you don't want.

<div class="blogcomponent_callout">At a 80% human originality score, more than 20% of your favorite writer's articles will be flagged as AI generated text. That's over 1 in 5 articles.</div>

How do you feel about that?

Let's turn it up to the maximum.

Scenario 3: Human originality score of 90

At the last level - the highest quality, highest acceptance criteria of 90% human score. And you're right, things get worse.

<div class="blogcomponent_callout">In our test, 28 out of 100 human-written samples were classified as written by AI by the Originality.ai tool.</div>

The findings of this experiment may have surprised some of you, especially those who've been relying on AI content detector tools to filter writers.

Can human-written text be classified as AI-generated text?

Yes. As we've just seen.

AI content detection accuracy can sometimes come at the cost of human-written effort. It's unfortunate, but there are some trade-offs when detecting AI generated content. Similar to how a new Google Search algorithm update can punish innocent sites, human generated content can be incorrectly marked as AI generated content.

So where does this leave us now? How do you go about using AI tools in content?

Is AI-generated content bad for SEO?

It's possible that you're reading this as an SEO or content writer who's using tools to either generate or detect AI written content or both. So let me explain.

It depends.

For example, filling your agency's blog with AI written content around SEO services without fact-checking them will hurt your brand reputation. Doing the same for your company's blog or affiliate website will probably not be helpful to your readers either.

In a worst-case scenario, your content will be incorrect and not satisfy search intent leading to readers dropping off your pages, and hurting your SEO efforts.

But there are many other applications that you can use content from artificial intelligence to rank with success. For example,

quick PBN setup
detailed drafts and blog outlines for writers to go the extra mile
scale up guest posting
write outreach emails and many more.

AI content can be harmful if you are using it to write articles that are thin and don't provide enough value for your readers. But if used correctly, AI content can be a game changer for your content marketing.

Does Google detect and punish AI content?

Google does not care whether your content is AI generated or human-written. Your content must have quality, period.

Why?

Because why would they care? Does it matter if the content is written by a native speaker or not, as long as it is informational and provides value?

Will you be penalized for that?

Of course not!

But this isn't news; Google has always maintained a stance against search ranking manipulation whether you're dealing with link exchanges, or keyword stuffing in the black hat years.

Google and other search engines can detect AI generated content but if the user likes what they see and is satisfied with an answer they read, that's good enough.

It does not matter to Google whether it is AI or human written content.

In fact, you can use Surfer's AI Humanizer to write content to pass AI detection, if you are still not convinced.

Turn on the anti AI detection feature to write articles that can pass AI detection tools.

Our AI tool generates search engine friendly content that can appear as human written text to Google's AI detecting algorithm.

Why are AI pages being penalized?

Not too long ago, Google introduced BERT and other AI models to understand the web's content.

These are heavy algorithms using a lot of computing resources.

Now imagine if content creation capabilities increased by a thousand times on the web.

Connect the dots.

Do you see?

Google and other search engines have to penalize AI content that is solely created to manipulate search rankings without providing value. The sad truth is that a lot of AI writing tools support this, whether they agree with it or not.

AI content has come to be associated with the SEO black hat, leading to a general uneasiness amongst folks in the content and SEO communites.

But not because it is always a bad thing.

You can still use AI content writing tools to help you scale your website's content. As long as you're not trying to manipulate search engine rankings, and are focused on delivering value for your readers, you'll be fine.

Focus on quality not scores

I hope that this little experiment helped you see that following AI/human originality scores blindly can mistakenly lay the blame on your writer for using AI tools even when they are innocent. Even an AI content detector using a giant language model test can be wrong so often you can't rely on it too heavily.

Where is the sweet spot to using AI generated content then?

Use AI content where it belongs and make sure that whether you use AI generated text or human writing, they convey accurate information about the topic, hit semantically relevant entities and satisfy user intent sufficiently.

For example, Buzzfeed is using AI written content to help them with interactive quizzes.‍

You can use AI to help you write content for blog posts but don't forget about link building to improve your rankings and reach wider audiences

Let me know what you think of the case study in the comments below. Or tweet at me here.

Happy Surfing!