We use cookies to give you the most relevant experience. By clicking Accept All, you consent to our use of cookies.

Privacy PolicyDo Not Sell My Personal Information

70 ways LLMs like ChatGPT give themselves away in survey responses

Survey fraud has become more sophisticated. In the past, you could recognize fake respondents by jumbled text, nonsense answers, or clunky bot behavior. Now, with the rise of large language models (LLMs) such as ChatGPT and Gemini, fraudsters can create responses that appear natural, clear, and well-written. On the surface, these answers look credible. Yet when you examine them closely, they often reveal subtle patterns.

Here are 70 writing nuances we’ve seen repeatedly in AI-generated survey responses:

  1. Predictable arcs 🎭 intro positive → downside → conclude balanced
  2. Triadic cadences 🔺“reliable, affordable, and adaptable”
  3. Quad-clause sentences ➰connected by two semicolons
  4. Elevated linkers 🔗 “moreover,” “furthermore,” and “additionally”
  5. “First, second, third” 🔢 scaffolding in 50-word open ends
  6. Anthropomorphising 🫀“Brand X listens to its consumers’ hearts”
  7. Latinate verbs 🏛️ “ameliorate” and “facilitate” vs. Anglo-Saxon
  8. No regional spelling drift 🌍 Some regions use different wording
  9. “Delve and Tapestry.” 📜 Almost unused archaic words
  10. “Of course! …” 🙋 Self-identifying preambles sometimes left intact
  11. Redundant synonyms 📦 e.g. “rapidly accelerate”
  12. “Paradigm shift” 🌐 Still beloved by bots, retired by most humans
  13. Superlative stacking 🏆 “most optimal,” “absolutely essential”
  14. “In today’s fast-paced world …” ⏩ Stock scene-setting
  15. Impossible recall 🧠 exact ad copy seen “two months ago”
  16. Gleaming spelling ✅ Zero typos across long free-text passages
  17. Broad clichés 🥱 “at the end of the day” sprinkled for no reason
  18. Template-ish disclaimers ⚠️“It’s important to note that …”
  19. “Leverage” 🛠️as a verb in consumer attitude items
  20. Use of semicolons ⚙️ inside list sentences
  21. Numeric hedges 📊 “approximately 70-80 %”
  22. Words like “cutting-edge” ✂️ to describe mundane products
  23. Things like “Holistic approach” in a snack 🥨 brand survey
  24. Near-obsolete qualifiers “whilst,” “amongst” in U.S. 🇺🇸panels
  25. Academic prepositions 🎓 “in terms of,” “with respect to”
  26. Highlighter words ✨“significantly” “substantially” “notably”
  27. Mirrored topic sentences 🔁 that restate the question
  28. Polite over-apology 🙇‍♀️“I apologize if this seems biased”
  29. Unneeded conditional hedging 📖 “were one to assume…”
  30. Future-perfect tense 🔮 “by 2027, consumers will have adopted…”
  31. High lexical diversity 🧐 with literally, like, you know, zero slang
  32. Verbose parentheticals 🗒️ that explain obvious terms
  33. Over-precise percentages 📐 “26.3 %” with no data source
  34. Politeness markers 😊 “thank you for hearing my perspective”
  35. Uniform positivity ratio 🙏 rare use of “not,” “never,” or “don’t”
  36. Balanced rebuttals ⚖️ when prompted for a single viewpoint
  37. Contrived optimism 🌞 “I am confident the future will be bright”
  38. Faux humility 🤲 “I may not be an expert, but…”
  39. Buzzword inflation 📣 “synergy,” “ecosystem,” “robust framework”
  40. Empty empathy 🤗 “as a consumer myself, I deeply feel…”
  41. Perfect paragraph symmetry 📏 each block 3–4 sentences
  42. Robotic gratitude 🙏 “I greatly appreciate the opportunity to share”
  43. Repetitive cadence 🎶 same rhythm across multiple responses
  44. Artificial balance beam ⚖️ always “pros and cons” framing
  45. Hollow personal anecdotes 👤 that sound generic or implausible
  46. Predictive futurism 🚀 “we will inevitably see exponential growth”
  47. Generic industry nods 🏭 “technology has changed everything”
  48. Pseudo-statistics 📉 “studies have shown…” with no attribution
  49. Frictionless grammar 🧼 no contractions, always full forms
  50. Scripted empathy lines 💬 “I understand both perspectives”
  51. Overuse of “overall” 🔄 closing nearly every paragraph
  52. Pretend inclusivity 🤝 “as we all know…” in niche contexts
  53. “On the other hand …” 🤔 even when unnecessary
  54. Grandiose universals 🌌 “for all of humanity,” “throughout history”
  55. Excessive transitions 🛤️ “having said that,” “with that being noted”
  56. Deterministic language 🎯 “it is certain that…”
  57. Polished yet bland 🎨 no slang, no typos, no personality
  58. Fake narrative recall 📚 “last week, I noticed in the store…”
  59. Odd idiom mismatches 🌀 “it’s raining cats” without “and dogs”
  60. Emotion inflation ❤️ “extremely delighted,” “deeply passionate”
  61. Middle-school metaphors 📏 “like a puzzle piece fitting together”
  62. Universal relatability 🌎 “everyone has experienced this”
  63. Suspicious concision ✂️ exactly 50 or 100 words, no drift
  64. Subtle repetition ♻️ rephrasing same idea within one answer
  65. Formal sign-offs 🖊️ “thank you kindly for considering my input”
  66. Over-framed hypotheticals 🧩 “if one were to imagine a scenario…”
  67. Overcompensated neutrality 🪢 “while I see merit in both…”
  68. Euphemistic avoidance 🚫 soft language for negatives (“less ideal”)
  69. Precise but hollow claims 📍 “this improves efficiency by 43%”
  70. Stylistic uniformity 🧍 all responses read in the same “voice”

These quirks may help you flag suspicious responses, but they are not enough on their own. By the time you’re searching for patterns like “paradigm shift” or over-precise percentages, fraudulent responses may already be influencing results. And relying only on manual review isn’t effective. A recent Guardian investigation showed thousands of UK students using AI to cheat on assignments, overwhelming detection systems. Tom’s Guide also tested five leading AI detectors and found them inconsistent, sometimes missing LLM-written text while incorrectly flagging human essays. Tools designed to evade detection, such as Undetectable.ai, make the problem even harder.

Our research backs this up. Rep Data identifies and flags around 32% of fraud (84% of that 32% which appears “good enough” to slip past basic defenses. We call this good-looking fraud, using answers that look authentic but quietly bias results. LLM writing tells are interesting, but fraudsters adapt quickly, and manual checks cannot keep up.

That’s why Rep Data takes a prevention-first approach. With Research Defender, fraud is blocked before it reaches your data. We combine VPN and device fingerprinting checks, hyperactivity scanning, machine learning models, and even LLM-based suppression to stop fraudulent responses at the source.

Spotting stylistic quirks can feel like detective work, but prevention is what actually protects your insights. If you want to stay ahead of evolving fraud tactics, you have to build stronger defenses rather than rely on guesswork. Curious how much AI-driven fraud might already be slipping past your current vendor? See how Rep Data can help.