Can we reliably spot AI-Generated content in the wild? The simple answer is, no.
Take this paragraph, for instance:
“It is important to remember that AI-generated content can be of high quality and indistinguishable from human-written text. Poorly written content may not necessarily have been created by an automated system; it could just as easily be the result of poor writing skills or a lack of attention to detail on the part of a human writer. Automated systems can make errors too, so relying solely on them for detecting AI-generated content can lead to inaccurate conclusions. Thus, using multiple methods is always best.”
I ran this sample through 4 different AI content checkers and received remarkably different results.
- Contentatscale.ai gave it a ranking of 58%, Unclear if it is AI content.
- Writer.com classified it as 100% human-generated content.
- Copyleaks.com as 21.4% probability for human
- Crossplag.com rated it as 99%, stating “This text is mainly written by a human.”
How Do AI-Generated Content Checkers Work?
Content checkers use text analysis methods such as predictability analysis, probability calculations, and pattern recognition
Predictability Analysis
Predictability analysis is a method of text analysis that uses statistics and modeling techniques to make predictions about future outcomes and performance. These systems look for repetition, the similarity between sentences, and unnatural language structure.
Probability Calculations
Data scientists use probability in natural language generation, such as in natural language processing (NLP) systems, to determine the most likely words, phrases, and sentences to generate next, based on the data it was trained on. The idea is simple enough: given the context of the surrounding content, the grammar for the language used, and where we are at in the sentence, what is most likely the next word or phrase?
Content generators such as OpenAI/ChatGTP are typically trained on large datasets and incorporate massively diverse samples. They also provide parameters to nudge the decision-making process.
For OpenAI these parameters are:
- Temperature – controls randomness. The lower this value, the more deterministic.
- Top P – controls diversity.
- Frequency – may penalize new tokens based on their existing frequency.
- Presence – may penalize based on the appearance of the token in the text.
Pattern Recognition
AI systems evaluate new data against existing data to classify information and look for patterns. Think of the facial recognition technology that unlocks your phone.
Examples of word and sentence patterns that AI might use include:
- Repeating words or phrases
- Using certain terms more frequently than others
- Using a specific set of words in the same order
- Using specific sentence structures or syntax
- Alternating between short and long sentences
- Using uncommon words or phrases
Not All AI-Generated Content Checkers Are Created Equal
The four sites mentioned above rated the same content differently.
The two sites that identified the content as AI-generated content don’t agree on how likely it is to be AI-generated.
Which ones are correct?
AI content checkers use the same basic technologies and mathematical concepts to evaluate content that the AI generators use to create the content.
Datasets
The training datasets make a big difference in the abilities of the system and the areas where they are qualified to work.
- We can train Content checkers on a specific set of editorial guidelines and use them to detect any errors or inconsistencies in the given text.
- We can train them on a corpus of text generated by an AI system and used to detect AI-generated content.
- Additionally, content checkers may also use natural language processing techniques, such as pattern recognition, to detect any errors or inconsistencies in the text.
Training
Different methods for training AI systems include supervised learning, unsupervised learning, reinforcement learning, and transfer learning.
- In supervised learning, we train an AI system on a labeled dataset, with the labels indicating the desired output.
- In unsupervised learning, we train an AI system on an unlabeled dataset and require it to identify patterns in the data without external guidance.
- In reinforcement learning, we train an AI system by rewarding it for the successful completion of tasks, which encourages the AI system to learn more effectively.
- In transfer learning, an AI system can use the knowledge acquired in one learning task to apply to another learning task.
AI systems require massive computational resources, even when tuned for the lowest, most general, and therefore most likely AI-generated, answers. Tuning them for greater refinement results in increased computational costs. Unfortunately, none of the tools I tested allow for adjusting that tuning.
Which of the systems listed above correctly classified the sample?
OpenAI’s Own Thoughts on the Matter
OpenAI wrote the opening sample. After carrying on an extensive conversation with it about machine -vs- human content creators, skilled -vs- unskilled writers, and how content checkers work, I decided to have OpenAI check its own content.
In its own words: “Based on the features of this text, it is unlikely that this was generated by an AI system. The sentence structure and word choice indicate a high degree of sophistication which would be difficult for most AI systems to replicate. Additionally, there are no signs of repetition or unnatural language patterns which could signal the presence of an algorithm. Therefore, it is safe to say that this text was likely written by a human writer.”
I reminded OpenAI that it wrote the original content.
OpenAI’s response: “Yes, that is correct. After evaluating the features of this text, I determined that it was unlikely to have been generated by an AI system. The sentence structure and word choice indicated a high degree of sophistication which would be difficult for most AI systems to replicate. Additionally, there were no signs of repetition or unnatural language patterns which could signal the use of an algorithm. Therefore, it is safe to say that this text was likely written by a human writer.”
I asked one more question. Can an AI system generate content that another AI system cannot detect as AI-generated?
OpenAI: ”It is very likely that AI can generate content that it cannot detect was AI-generated. This is because AI systems are limited by their algorithms and can only generate content that is within the parameters of their programming. Therefore, AI systems are not yet able to detect content that has been generated by another AI system or that has been generated manually by a human.”