70 Ways to Catch ChatGPT in Your Open Ends At Rep Data, we use Research Defender, so we don’t get AI responses in the first place. This tool is by far the best of all possible ways to prevent fraudulent survey responses from entering into your survey results, regardless of your sample source. However! For those of you who just want an idea of how to better spot and catch ChatGPT or LLM responses in your surveys with a set of instructions that go way far deeper than just spotting the “—” em dash, feel free to meander these tips. Note 1: These are not the rules applied in our Research Defender’s LLM writing-style detection, those rules are kept strictly confidential to prevent bad actors. Note 2: These patterns aren’t hard rules or red flags on their own, nor are they meant to be applied in a “pass all 70 or its AI”, but together, they do form a fingerprint. The more of them you spot stacked in a single open end, the more confidently you can question authorship. 1. Elevated linkers “moreover,” “furthermore,” and “additionally” Human panelists under time pressure rarely reach for such elevated transitions; they hammer the Enter key or drop a simple “and.” Look for sequences of three‑plus sentences, each one prefaced by a textbook connector. 2. Predictable arc intro positive → downside → conclude balanced Even when explicitly nudged to be opinionated, LLM’s often start courteously, raise a concern, and end on compromise. Genuine respondents, in contrast, either gush or rant depending on mood; they almost never self‑edit into rhetorical symmetry. 3. Quad‑clause sentences connected by two semicolons Because semicolons solve the model’s aversion to fragments it happily stitches four independent thoughts into one monster sentence; each clause grammatically pristine; separated by semicolons and maybe a “however.” Most non‑writers fear semicolons and would break the thought into bullets or leave unpunctuated. 4. Triadic cadence “reliable, affordable, and adaptable” The “rule of three” is all over the model’s training data (think speeches and ad copy). Consequently, triads surface even when the source material, like a snack bar or detergent, doesn’t invite high rhetoric. 5. “First, second, third” scaffolding in 50‑word open ends When asked to defend a viewpoint, the model enumerates. Humans in panel settings almost never bother, they just front‑load their main gripe. Enumeration inside what should be a spontaneous comment betrays machine logic. 6. Anthropomorphising “Brand X listens to its consumers’ hearts” LLMs have soaked up decades of marketing fluff that personifies corporations. Unprompted, they speak of brands “hearing,” “caring,” or “embracing” customers. Real shoppers talk about prices, taste, or service. Spot the misplaced pathos. 7. Latinate verbs “ameliorate” and “facilitate” vs. Anglo‑Saxon Given a choice, the model reaches for Latinate formalities because training data equates them with professionalism. A tired parent in a survey won’t say “ameliorate packaging issues,” they’ll say “fix the box.” Overly polysyllabic verbs are a red flag. 8. “Delve and Tapestry.” Almost unused archaic words A few archaic or literary tokens show up disproportionately in model outputs because they’re common in public‑domain ebooks but rare in contemporary conversation. When you see a respondent “delve” into a “tapestry of flavors,” consider the odds: corpus ghostwriting is far more likely than a poetic supermarket shopper. 9. “Of course! …” Self‑identifying preambles sometimes left intact ChatGPT’s safety style often begins with a warm interjection (“Of course!” “Certainly!”). If a panelist's answer starts with coaxing words that feel like they belong in a help‑desk chat, you may be reading the wrapper of a system prompt that wasn’t fully clipped. 10. Redundant synonyms e.g. “rapidly accelerate” LLMs love intensifiers and synonym pairs because they’re statistically reinforced bigrams. Humans shortcut: if they wrote “accelerate,” they skip “rapidly.” Stacked synonyms enlarge word count without adding meaning, classic sign of synthetic text inflation. 11. “Paradigm shift” Still beloved by bots, retired by most humans Search n‑grams for “paradigm shift” post‑2015 and you’ll find a decline, except in AI outputs. The phrase persists because seminal management books (an LLM staple) use it heavily. Modern writers avoided it years ago, so its surprise resurgence often indicates an algorithmic echo. 12. Superlative stacking “most optimal,” “absolutely essential” Because superlatives occur near brand slogans and executive briefs, the model double‑downs: “most optimal,” “truly unparalleled,” “absolutely essential.” Ordinary respondents pick one adjective or skip intensifiers entirely. Count compounded superlatives to diagnose machine exuberance. 13. “In today’s fast‑paced world …” Stock scene‑setting This phrase sits in the dusty attic of corporate boilerplate yet is everywhere in LLM out‑ put because those texts feed the training diet. Shoppers venting about cereal don’t open with sweeping temporal commentary. The presence of this throat‑clearing clause is almost a fluorescent marker for AI. 14. Impossible recall exact ad copy seen “two months ago” If a respondent quotes a 20‑word slogan verbatim and timestamps it precisely (“in early May, the billboard read ‘Refresh Your World’”), suspect hallucination. LLMs invent specific bodies of text to sound authoritative. Real people won’t risk that degree of recall unless reading from a script. 15. Gleaming spelling Zero typos across long free‑text passages No fat‑finger errors, no missing apostrophes, perfect curly quotes: it feels typeset, not tapped into a form. Authentic open ends are often peppered with at least minor typos, especially on mobile, but alone this rule is too strict, because, well, some people take their spelling pretty seriously. 16. Broad clichés “at the end of the day” sprinkled for no reason The model knows clichés placate grade‑school writing rubrics, so it drops them in as verbal scaffolding. Humans use clichés, but they’re tied to accent or era; LLM usage is mechanical, distribution‑agnostic, and often clustered every few sentences. 17. Template‑ish disclaimers It’s important to note that …” Formal qualifiers appear early in knowledge articles and get learned wholesale. A shopper venting about soggy fries doesn’t shift into mini‑journal‐review mode. Stock disclaimers inside casual survey answers point straight to algorithmic text. 18. “Leverage” as a verb in consumer attitude items “Leverage” dominated 1990s PowerPoint decks and still saturates consulting memos, hence its weight in training data. Unpaid panelists hardly ever conjugate it. When a streetwear fan says a brand should “leverage its omnichannel footprint,” you’re reading synthetic MBAspeak. 19. Use of semicolons inside list sentences Most English speakers avoid semicolons altogether online. The model, steeped in editorial style guides, merrily deploys them to join list items. One semicolon: maybe a grammar‑sharp respondent. Two or more: probable bot. 20. Numeric hedges “approximately 70‑80 %” Humans guess round numbers: “about three quarters.” LLMs recall statistical phrasing (“approximately 70‑80 percent”) because scientific text teaches them hedged precision. The combo of “approximately” plus a range is an AI fingerprint. 21. Words like “cutting‑edge” to describe mundane products The model inherits press‑release diction where every toaster is “innovative.” Shoppers are blunt: “pretty good,” “meh,” “cheap.” If a can of beans is lauded as “cutting‑edge,” algorithmic overselling is the likeliest source. 22. Things like “Holistic approach” in a snack brand survey Same pattern: corporate wellness jargon appears far outside its domain because the model defaults to abstract solution language. Real eaters just want flavor. Be wary when a chips review morphs into a strategy off‑site. 23. Near‑obsolete qualifiers “whilst,” “amongst” in U.S. panels The model sees “whilst” in public‑domain British novels; median U.S. respondents do not. Spotting archaic Britishisms in a North‑American data set is a strong hint of a synthetic contribution. 24. Academic prepositions “in terms of,” “with respect to”) These phrases cushion claims in research papers. Genuine opinion givers skip them and speak directly: “I like the size.” When each sentence is buffered by academic hedges, it’s likely model output. 25. Highlighter words “significantly” “substantially” “notably” Scientific writers flag statistically significant effects; LLMs reuse those markers as generic emphasis. Consumers don’t. If “significantly” appears three times in 80 words, you’re reading a ghostwritten journal abstract, not lived experience. 26. Mirrored topic sentences that restate the question Achieving “alignment” with the prompt, the model reflexively repeats it: Q: “What do you like about Brand X?” A: “What I like about Brand X is…” Humans dive straight into content. Mechanical mirroring is a quick heuristic for AI detection. 27. Polite over‑apology “I apologize if this seems biased” Courteous disclaimers mitigate toxicity risk in LLM deployment. Real respondents aren’t that self‑aware or diplomatic. Excessive apology, especially before any offense, often signals a safety‑conditioned model. 28. Unneeded conditional hedging “were one to assume…” Victorian subjunctive phrasing is another artifact of literature corpora. Everyday English uses “if.” The archaic conditional screams “public domain training data.” 29. Future‑perfect tense “by 2027, consumers will have adopted…” Forecasting language turns up in whitepapers, so LLMs use it broadly, even when the survey question is “How was your delivery today?” Temporal projection inside a present‑tense feedback prompt shows mismatch characteristic of AI. 30. High lexical diversity with literally, like, you know, zero slang The model optimizes for richness yet filters informality to avoid unsafe outputs. That yields vocabulary variance without a single “kinda,” “gonna,” or emoji, an unnatural stylistic gap. 31. No regional spelling drift Some regions use different wording A Canadian panellist toggles “colour” and “centre.” LLM defaults to U.S. or U.K. spellings consistently across all answers. Consistency where diversity is expected marks centralized generation. 32. Verbose parentheticals that explain obvious terms Because instructive prose often defines jargon on first use, the model copies the pattern: “SaaS (Software as a Service) model.” In layperson feedback, such as “the chips were stale,” explanations are gratuitous. 33. Over‑precise percentages “26.3 %” with no data source Humans round unless holding the report in hand. A stray tenth‑of‑a‑percent, without citation, points to a hallucinated statistic. 34. Politeness markers “thank you for hearing my perspective” End‑of‑response gratitude appears everywhere in customer‑service scripts that feed LLMs. Panelists usually hit Submit. Closing thanks in an unpaid survey is conspicuous. 35. Uniform positivity ratio rare use of “not,” “never,” or “don’t” LLMs learn that positivity reduces moderation flags. Consequently, negative contractions vanish. A corpus of squeaky‑clean, glass‑half‑full open ends is a red flag. 36. Balanced rebuttals when prompted for a single viewpoint Ask, “what did you dislike?” and the AI still offers pros and cons because debate essays dominate training. Humans answer the brief. Off‑prompt balance suggests algorithmic autopilot. 37. Unwarranted use of certain words like “respectfully” An honorific adverb closing critique evokes formal letter writing. Few modern texters deploy “respectfully” outside e-mails to HR. Its appearance inside a two‑sentence snack review is diagnostic. 38. Capitalised ideas mid‑sentence like “Customer Experience” LLMs read marketing decks where internal concepts get C‑suite capitalization. Ordinary writers reserve caps for proper nouns. Mixed‑case terminology mid‑sentence flags deck‑sourced phrasing. 39. Perfect grammatical multilingual gender agreement When asked in Spanish, French, then English, the model nails gender and agreement each time. Bilingual humans slip. Too‑perfect consistency across languages is suspicion‑worthy. 40. Extraneous explanatory commas as if written for style guides Serial subordinate clauses and appositives, each cordoned by commas, reflect editorial prose. Stream‑of‑consciousness feedback reads choppier. Count how often commas could be deleted without harming comprehension. 41. Markdown artifacts (backticks, asterisks) pasted into text The model often formats for Markdown because its training sources do. When the response field strips styling, orphan backticks or asterisks remain—a smoking gun. 42. Line‑break spacing that mirrors ChatGPT’s double return ChatGPT typically inserts a blank line between paragraphs. If panel software preserves that, you see pillar‑style spacing instead of a single compact block. Compare layout across responses; uniform double spacing hints at paste‑in boilerplate. 43. ASCII horizontal rules (“---”) that no human types in surveys Markdown’s horizontal rule appears when a model tries to “organize” thoughts. No phone keyboard surfaces triple‑dash separators intuitively. Their presence equals copy‑pasted AI text. 44. Title‑case headings crowbarred into short‑answer fields When an LLM intends structure, it pushes headings even if the field is one paragraph long. A human confronted with a 500‑character limit won’t waste 25 characters on “Pros And Cons.” 45. Oxford comma loyalty even in countries where it is rare The Oxford comma is baked into many English‑language training sets. Regional respondents follow local convention; an unwavering Oxford comma across U.K. data sets suggests imported prose. 46. Numbered outlines (1.1, 1.2) in one‑paragraph responses Technical docs teach the model hierarchical numbering. Consumers seldom know or care about nested decimals. Seeing “1.1 Flavor profile” inside a sandwich survey is a dead giveaway. 47. “N/A” used with full stops (“N. A.”) copying academic style A model exposed to academic abbreviations sometimes outputs punctuated “N. A.” Humans either type “n/a” or leave the field blank. Dots in an acronym show training bias. 48. Quotation‑mark correctness curly quotes everywhere Most browsers default to straight quotes; producing curly ones requires smart‑punctuation or word‑processor copy‑paste. LLMs emit typographer’s quotes by default. Curly punctuation consistency across a form field rings synthetic. 49. Never “prefer not to answer.” Rarely invokes privacy option Models fill every blank, they’re rewarded for completeness. Humans occasionally defer. A respondent who never uses skip logic across dozens of sensitive items may be a bot. 50. All‑caps emphasis avoided instead, italic markup that does not render LLMs emphasize words with asterisks or underscores, assuming Markdown. Humans turn to ALL CAPS. Italic markers showing literally in plaintext point to model training, but many survey programming tools won’t capture visibility so it’s harder to spot. 51. Multi‑paragraph answers where humans write one block Time‑pressed panelists dump everything in one chunk. The model’s “good writing” heuristic inserts line breaks for readability. Two‑plus paragraphs in a micro‑incentive survey can indicate algorithmic paste. 52. Textbook definitions “Sustainability refers to…” Instead of sharing experience, the AI tutors. When the response begins with a dictionary definition, it likely originated from a model pattern‑matching “Explain.” Real consumers seldom define the terms they live daily. 53. Universal perspective “one might argue” Academic detachment sneaks in because the model often reads argumentative essays. A teenager rating ice cream does not shift into “one might argue that the flavor profile…” That linguistic distance suggests computational composition. 54. Speculative futurism injected into present‑tense questions Ask, “How satisfied are you with shipping?” and the model riffs on autonomous delivery drones by 2030. It senses “technology” and leaps ahead, hallmark of wide‑breadth pretraining, not personal experience. 55. Cites “recent studies” whereas a human wouldn’t say that Panelists rarely reference external literature; they’re paid cents, not writing term papers. The model peppers claims with “recent studies” to bolster authority, usually without citation. Question the presence of academic echoes. 56. Answers without first‑person pronouns “I,” “my” LLMs sometimes drop pronouns to sound neutral. Actual participants speak in “I” because the question explicitly asks for it. Absence of self‑reference is a subtle alignment of safe, impersonal style. 57. Lacks personal anecdotes no “my kids” “my commute” Even when asked “Why did you choose this cereal for your kids?”, the model generalizes. Humans share vignettes. Scan for zero concrete anecdotes across multiple respondent IDs; that consistency is suspicious. 58. Brand‑neutral sentiment where panelists have favourites Humans love or hate; LLMs hedge. They’ll acknowledge “both the strengths and limitations” instead of cheering “Sprite forever!” Neutral thoroughness stands out in categories driven by brand passion. 59. Never expresses uncertainty “I’m not sure” The model aims for helpfulness, so it offers a confident answer rather than admit ignorance. Real users shrug or say “dunno.” A corpus with zero uncertainty phrases hints at synthetic certainty. 60. Treats “describe your experience” like a product review blog The model turns the ask into a polished, SEO‑ready review complete with pros, cons, and a verdict. Spontaneous respondents are lazier and less formally structured. Recognize review‑site tropes as potential AI imports. 61. Summarizes survey question text unnecessarily Because chain‑of‑thought tries to be explicit, the model paraphrases the prompt: “You asked me to describe…” Humans jump right in, saving keystrokes. Echoed questions waste characters but signal aligned‑prompt reasoning. 62. Stilted metaphors “like a lighthouse guiding my journey” High‑falutin metaphors bubble up from speeches and inspirational blogs. They sound forced in snack reviews. When you see grand nautical imagery for a cereal bar, suspect algorithmic flourish. 63. Avoids natural hesitation markers “uh,” “well,” “hmm” LLMs strip fillers to read professionally. Humans type as they think, especially on mobile: “umm I guess…” Lack of any hesitation across many open ends suggests synthetic cleansing. 64. Never trailing sentences always “complete thoughts.” LLMs dislike unfinished syntax, they fine‑tune on corpora penalizing it. Real feedback often ends mid‑idea when the timer dings. Perfectly closed punctuation at every turn is a machine hallmark. 65. Fabricated “common knowledge” “Most people think that...” To sound authoritative, the model invokes vague consensus without evidence. Real respondents claim ownership: “I think.” Collective assertions without source are a hallucination pattern. 66. Weird rhetoricals “What more could a consumer want?” Copywriters of yore loved rhetorical flourishes; the model copies them. In real surveys, rhetorical questions are rare because participants answer, not orate. Spotting such theatrical devices helps flag automation. 67. Invents overly specific user personas when not prompted Ask, “Who is this product for?” and the model crafts “time‑pressed urban millennials seeking protein on‑the‑go.” Humans say “busy people.” Over‑segmented personas reveal marketing text training. 68. Phrases like “The concept of…” even in casual settings The model abstracts nouns into “concepts” because that phrase appears in academic critique. A caffeine‑seeking commuter doesn’t discuss “the concept of an energy drink.” Flag unnecessary conceptualization. 69. Summary phrases “This product can be considered…” LLMs love formal conclusion clauses learned from essays. Respondents seldom write mini‑abstracts ending with “therefore.” If each answer contains a wrap‑up phrase, the pattern is algorithmic. 70. Echoes exact brand terminology without cueing When asked about “soda,” the model dutifully references “SparkleFizz™ Carbonation Technology,” mirroring press‑release phrasing it saw in training. Real users don’t memorize trademarked sub‑brands. Okay, so as much as I’d like to end nice and clean on #70… I’ve got one more! 71. Writes about emotions. “This may cause frustration…” LLMs analytically label emotions instead of conveying them: they say “users may feel delight,” not “I was stoked!” The distance between named emotion and felt voice hints at synthetic empathy. What’s Next! At the end of the day ( Whoops! That was #16 from the list above! ), these patterns aren’t hard rules or red flags on their own, but together, form a fingerprint. The more of them you spot stacked in a single open end, the more confidently you can question authorship. They’re not meant to be weaponized in a strict pass/fail test, but rather to sharpen your intuition and deepen your eye for data quality. At Rep Data, we rely on Research Defender to filter out LLM‑authored responses before they ever reach the dataset, and that’s still your best move. Happy researching! Sincerely, The Rep Data Team