70 ways LLMs like ChatGPT give themselves away in survey responses
Survey fraud has become more sophisticated. In the past, you could recognize fake respondents by jumbled text, nonsense answers, or clunky bot behavior. Now, with the rise of large language models (LLMs) such as ChatGPT and Gemini, fraudsters can create responses that appear natural, clear, and well-written. On the surface, these answers look credible. Yet when you examine them closely, they often reveal subtle patterns.
Here are 70 writing nuances we’ve seen repeatedly in AI-generated survey responses:
- Predictable arcs 🎭 intro positive → downside → conclude balanced
- Triadic cadences 🔺“reliable, affordable, and adaptable”
- Quad-clause sentences ➰connected by two semicolons
- Elevated linkers 🔗 “moreover,” “furthermore,” and “additionally”
- “First, second, third” 🔢 scaffolding in 50-word open ends
- Anthropomorphising 🫀“Brand X listens to its consumers’ hearts”
- Latinate verbs 🏛️ “ameliorate” and “facilitate” vs. Anglo-Saxon
- No regional spelling drift 🌍 Some regions use different wording
- “Delve and Tapestry.” 📜 Almost unused archaic words
- “Of course! …” 🙋 Self-identifying preambles sometimes left intact
- Redundant synonyms 📦 e.g. “rapidly accelerate”
- “Paradigm shift” 🌐 Still beloved by bots, retired by most humans
- Superlative stacking 🏆 “most optimal,” “absolutely essential”
- “In today’s fast-paced world …” ⏩ Stock scene-setting
- Impossible recall 🧠 exact ad copy seen “two months ago”
- Gleaming spelling ✅ Zero typos across long free-text passages
- Broad clichés 🥱 “at the end of the day” sprinkled for no reason
- Template-ish disclaimers ⚠️“It’s important to note that …”
- “Leverage” 🛠️as a verb in consumer attitude items
- Use of semicolons ⚙️ inside list sentences
- Numeric hedges 📊 “approximately 70-80 %”
- Words like “cutting-edge” ✂️ to describe mundane products
- Things like “Holistic approach” in a snack 🥨 brand survey
- Near-obsolete qualifiers “whilst,” “amongst” in U.S. 🇺🇸panels
- Academic prepositions 🎓 “in terms of,” “with respect to”
- Highlighter words ✨“significantly” “substantially” “notably”
- Mirrored topic sentences 🔁 that restate the question
- Polite over-apology 🙇♀️“I apologize if this seems biased”
- Unneeded conditional hedging 📖 “were one to assume…”
- Future-perfect tense 🔮 “by 2027, consumers will have adopted…”
- High lexical diversity 🧐 with literally, like, you know, zero slang
- Verbose parentheticals 🗒️ that explain obvious terms
- Over-precise percentages 📐 “26.3 %” with no data source
- Politeness markers 😊 “thank you for hearing my perspective”
- Uniform positivity ratio 🙏 rare use of “not,” “never,” or “don’t”
- Balanced rebuttals ⚖️ when prompted for a single viewpoint
- Contrived optimism 🌞 “I am confident the future will be bright”
- Faux humility 🤲 “I may not be an expert, but…”
- Buzzword inflation 📣 “synergy,” “ecosystem,” “robust framework”
- Empty empathy 🤗 “as a consumer myself, I deeply feel…”
- Perfect paragraph symmetry 📏 each block 3–4 sentences
- Robotic gratitude 🙏 “I greatly appreciate the opportunity to share”
- Repetitive cadence 🎶 same rhythm across multiple responses
- Artificial balance beam ⚖️ always “pros and cons” framing
- Hollow personal anecdotes 👤 that sound generic or implausible
- Predictive futurism 🚀 “we will inevitably see exponential growth”
- Generic industry nods 🏭 “technology has changed everything”
- Pseudo-statistics 📉 “studies have shown…” with no attribution
- Frictionless grammar 🧼 no contractions, always full forms
- Scripted empathy lines 💬 “I understand both perspectives”
- Overuse of “overall” 🔄 closing nearly every paragraph
- Pretend inclusivity 🤝 “as we all know…” in niche contexts
- “On the other hand …” 🤔 even when unnecessary
- Grandiose universals 🌌 “for all of humanity,” “throughout history”
- Excessive transitions 🛤️ “having said that,” “with that being noted”
- Deterministic language 🎯 “it is certain that…”
- Polished yet bland 🎨 no slang, no typos, no personality
- Fake narrative recall 📚 “last week, I noticed in the store…”
- Odd idiom mismatches 🌀 “it’s raining cats” without “and dogs”
- Emotion inflation ❤️ “extremely delighted,” “deeply passionate”
- Middle-school metaphors 📏 “like a puzzle piece fitting together”
- Universal relatability 🌎 “everyone has experienced this”
- Suspicious concision ✂️ exactly 50 or 100 words, no drift
- Subtle repetition ♻️ rephrasing same idea within one answer
- Formal sign-offs 🖊️ “thank you kindly for considering my input”
- Over-framed hypotheticals 🧩 “if one were to imagine a scenario…”
- Overcompensated neutrality 🪢 “while I see merit in both…”
- Euphemistic avoidance 🚫 soft language for negatives (“less ideal”)
- Precise but hollow claims 📍 “this improves efficiency by 43%”
- Stylistic uniformity 🧍 all responses read in the same “voice”
These quirks may help you flag suspicious responses, but they are not enough on their own. By the time you’re searching for patterns like “paradigm shift” or over-precise percentages, fraudulent responses may already be influencing results. And relying only on manual review isn’t effective. A recent Guardian investigation showed thousands of UK students using AI to cheat on assignments, overwhelming detection systems. Tom’s Guide also tested five leading AI detectors and found them inconsistent, sometimes missing LLM-written text while incorrectly flagging human essays. Tools designed to evade detection, such as Undetectable.ai, make the problem even harder.
Our research backs this up. Rep Data identifies and flags around 32% of fraud (84% of that 32% which appears “good enough” to slip past basic defenses. We call this good-looking fraud, using answers that look authentic but quietly bias results. LLM writing tells are interesting, but fraudsters adapt quickly, and manual checks cannot keep up.
That’s why Rep Data takes a prevention-first approach. With Research Defender, fraud is blocked before it reaches your data. We combine VPN and device fingerprinting checks, hyperactivity scanning, machine learning models, and even LLM-based suppression to stop fraudulent responses at the source.
Spotting stylistic quirks can feel like detective work, but prevention is what actually protects your insights. If you want to stay ahead of evolving fraud tactics, you have to build stronger defenses rather than rely on guesswork. Curious how much AI-driven fraud might already be slipping past your current vendor? See how Rep Data can help.
###
About Research Defender
With a goal to help the sample and market research industry create a clean, healthy, and efficient ecosystem, Research Defender has created a secure platform to help our clients take control of their traffic and the quality of their product. Research Defender facilitates high-quality and efficient transactions across the online research ecosystem for both buyers and sellers of sample.
About Rep Data
Rep Data provides full-service data collection solutions for primary researchers, helping expedite data collection for primary quantitative research studies, with a hyper-focus on data quality and consistent execution. The company’s mission is to be a reliable, repeatable data collection partner for approximately 500 clients, including market research agencies, management consultancies, Fortune 500 corporations, advertising agencies, brand strategy consultancies, universities, communications agencies, public relations firms, and more.
Media Contact:
media@repdata.com