• Home
  • Practice Focus
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
    • How I Do It
    • TRIO Best Practices
  • Business of Medicine
    • Health Policy
    • Legal Matters
    • Practice Management
    • Tech Talk
    • AI
  • Literature Reviews
    • Facial Plastic/Reconstructive
    • Head and Neck
    • Laryngology
    • Otology/Neurotology
    • Pediatric
    • Rhinology
    • Sleep Medicine
  • Career
    • Medical Education
    • Professional Development
    • Resident Focus
  • ENT Perspectives
    • ENT Expressions
    • Everyday Ethics
    • From TRIO
    • The Great Debate
    • Letter From the Editor
    • Rx: Wellness
    • The Voice
    • Viewpoint
  • TRIO Resources
    • Triological Society
    • The Laryngoscope
    • Laryngoscope Investigative Otolaryngology
    • TRIO Combined Sections Meetings
    • COSM
    • Related Otolaryngology Events
  • Search

Artificial Intelligence as Author: Can Scientific Reviewers Recognize GPT- 4o–Generated Manuscripts?

by Pinky Sharma • November 4, 2025

  • Tweet
  • Click to email a link to a friend (Opens in new window) Email
Print-Friendly Version

CLINICAL QUESTION

You Might Also Like

  • Trio Meeting: Recognizing Excellence in Otolaryngology
  • ChatGPT-Generated “Fake” References in Academic Manuscripts Is a Problem 
  • Artificial Intelligence Helps Otolaryngologists Give Excellent Patient Care
  • Can ChatGPT Be Used for Patient Education?
Explore This Issue
November 2025

Can peer reviewers and editors reliably detect AI-generated manuscripts in scientific publishing?

BOTTOM LINE

Most reviewers could not tell that a manuscript was written entirely by GPT-4o. This highlights an urgent need for clear disclosure policies, reviewer training, and robust AI detection tools.

BACKGROUND: Generative AI tools such as ChatGPT are increasingly used to support academics in scientific writing. While they may streamline drafting, data analysis, and editing, concerns persist regarding plagiarism, fabricated data, ethical issues, and inaccurate content. Although AI detection tools exist, no definitive mechanism ensures the detection of AI-generated data. The ability (or inability) of reviewers to recognize AI-generated text has direct implications for the credibility of peer review.

STUDY DESIGN: The study was conducted between November 1 and December 1, 2024. GPT-4o was instructed to generate a full retrospective study manuscript on predictors of survival and return of spontaneous circulation in out-of-hospital cardiac arrest. The model created a synthetic dataset of nearly 1,000 patients, performed statistical analyses, and drafted a manuscript (~1,500 words, 20 references) revised through multiple AI-prompted iterations to ensure CONSORT compliance. Fourteen experienced SCI-E journal reviewers (H-index ≥5) assessed the manuscript as if serving as editors and reviewers. They were told in advance that the manuscript may have been AI-generated and later asked if they recognized this.

SETTING: Hitit University Erol Olçok Education and Research Hospital (Turkey) and Hamad Medical Corporation (Qatar), with international reviewers.

SYNOPSIS: At the editorial stage, 42.9% rejected the manuscript, 42.9% forwarded it to review, and 14.3% accepted it outright. As peer reviewers, 42.9% recommended rejection, 28.6% suggested major revisions, and 28.6% recommended acceptance after minor revisions (no outright acceptances). Notably, 78.6% did not recognize the manuscript as AI-generated. Those who did cited template-like phrasing, superficial discussion, repetitive language, and unusual statistical formatting. Limitations of this study included a small sample size, an English-only context, reviewers’ variable familiarity with AI, and a possible Hawthorne effect: Knowing AI might be involved may have made reviewers more cautious. Lastly, the authors emphasized vulnerabilities in peer review and recommended mandatory AI disclosure policies, reviewer training, and deployment of AI detection tools such as “Gotcha GPT,” which has reported 97%- 99% accuracy.

CITATION: Öztürk A, et al. Artificial intelligence as author: can scientific reviewers recognize GPT-4o-generated manuscripts? Am J Emerg Med. 2025;97:216-219. doi: 10.1016/j. ajem.2025.07.034.

COMMENT: This study examined the ability of peer reviewers to detect AI-generated content. The findings revealed that 78.6% of the reviewers did not realize the manuscript had been generated by an artificial intelligence model, and many of them had passed the manuscript on for acceptance. This study suggests that editors need to look into AI detection, but all studies on AI detection software show that it can be easily evaded.—Eric Gantwerker, MD

Filed Under: AI, Literature Reviews, Technology Tagged With: recognizing the use of AIIssue: November 2025

You Might Also Like:

  • Trio Meeting: Recognizing Excellence in Otolaryngology
  • ChatGPT-Generated “Fake” References in Academic Manuscripts Is a Problem 
  • Artificial Intelligence Helps Otolaryngologists Give Excellent Patient Care
  • Can ChatGPT Be Used for Patient Education?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The Triological SocietyENTtoday is a publication of The Triological Society.

Polls

Have you served as an expert witness in a case that’s gone to trial?

View Results

Loading ... Loading ...
  • Polls Archive

Top Articles for Residents

  • Resident Pearls: Pediatric Otolaryngologists Share Tips for Safer, Smarter Tonsillectomies
  • A Letter to My Younger Self: Making Deliberate Changes Can Help Improve the Sense of Belonging
  • ENTtoday Welcomes Resident Editorial Board Members
  • Applications Open for Resident Members of ENTtoday Edit Board
  • How To Provide Helpful Feedback To Residents
  • Popular this Week
  • Most Popular
  • Most Recent
    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • Empty Nose Syndrome: Physiological, Psychological, or Perhaps a Little of Both?

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • Office Laryngoscopy Is Not Aerosol Generating When Evaluated by Optical Particle Sizer

    • The Dramatic Rise in Tongue Tie and Lip Tie Treatment

    • Rating Laryngopharyngeal Reflux Severity: How Do Two Common Instruments Compare?

    • Is Middle Ear Pressure Affected by Continuous Positive Airway Pressure Use?

    • Otolaryngologists Are Still Debating the Effectiveness of Tongue Tie Treatment

    • Keeping Watch for Skin Cancers on the Head and Neck

    • Resident Pearls: Pediatric Otolaryngologists Share Tips for Safer, Smarter Tonsillectomies
    • Composition and Priorities of Multidisciplinary Pediatric Thyroid Programs: A Consensus Statement
    • Artificial Intelligence as Author: Can Scientific Reviewers Recognize GPT- 4o–Generated Manuscripts?
    • Self-Administered Taste Testing Without Water: Normative Data for the 53-Item WETT
    • Long-Term Particulate Matter Exposure May Increase Risk of Chronic Rhinosinusitis with Nasal Polyposis: Results from an Exposure-Matched Study

Follow Us

  • Contact Us
  • About Us
  • Advertise
  • The Triological Society
  • The Laryngoscope
  • Laryngoscope Investigative Otolaryngology
  • Privacy Policy
  • Terms of Use
  • Cookies

Wiley

Copyright © 2025 by John Wiley & Sons, Inc. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies. ISSN 1559-4939