Safeguarding Rights. Shaping Futures.

Safeguarding Rights. Shaping Futures.

A Critical Examination of AI Detectors in Academic Integrity Enforcement 

Table of Contents

Abstract: As generative artificial intelligence (AI) tools like ChatGPT become increasingly integrated into academic environments, institutions have turned to AI detection technologies to preserve academic integrity. However, the reliability, fairness, and ethical implications of using AI detectors to identify cheating remain deeply contested. This white paper critically examines the performance of AI writing detectors and their impact, particularly on marginalized student populations such as non-native English speakers and students with autism. Drawing on empirical studies and institutional reports, this paper argues that AI detectors, in their current form, are not suitable as standalone tools for determining academic misconduct and should not be used as definitive proof of cheating. 

Introduction  

The advent of large language models (LLMs), such as OpenAI’s ChatGPT, has raised concerns about potential academic dishonesty, prompting schools and universities to adopt AI detection tools. These tools claim to distinguish AI-generated text from human writing, offering educators a means to flag suspected cheating. However, the effectiveness and fairness of these tools are increasingly under scrutiny. Reports of false positives, particularly among neurodivergent students and those with limited English proficiency, raise doubts about their reliability (Davalos & Yin, 2024; Liang et al., 2023). As academic institutions consider the future of assessments in an AI-enabled world, it is vital to critically assess whether current AI detectors meet the necessary standards of accuracy, equity, and ethical soundness. 

Technical Limitations of AI Detectors  

AI detectors operate by analyzing linguistic features, including text perplexity, token distribution, and syntactic regularity. While these methods may identify patterns characteristic of AI-generated content, they are easily fooled. According to Sadasivan et al. (2023), a simple recursive paraphrasing technique can reduce the detection accuracy of leading tools from 99% to under 10%, with minimal degradation in text quality. Even watermark-based detection methods once considered robust, can be spoofed through repeated querying or obfuscation. 

Feizi and Huang (2023) note that there is a theoretical limit to the reliability of AI detectors. As the difference (total variation distance) between human and machine-generated text decreases with advances in LLMs, even the best detectors will perform no better than random guessing. This inherent limitation suggests that as AI tools become more sophisticated, reliable detection may become mathematically impossible. 

Disproportionate Impact on Marginalized Students 

3.1. Non-Native English Speakers Research consistently shows that AI detectors are biased against non-native English writers. Liang et al. (2023) and Liang, Yuksekgonul, Mao, Wu, and Zou (2023) found that over 61% of TOEFL essays by non-native speakers were falsely classified as AI-generated. Detectors penalize simpler grammatical structures and limited vocabulary, features often present in the writing of English learners. This results in a significantly higher false positive rate for this population, reinforcing structural inequities in education. 

3.2. Students with Autism Students on the autism spectrum often exhibit structured, literal, or repetitive writing styles that may resemble AI-generated text. The case of Moira Olmsted, detailed by Davalos and Yin (2024), highlights this risk. Olmsted, a college student with autism, was falsely accused of cheating based solely on AI detector output. Despite explaining her communication style, which is shaped by her neurodivergence, she received a zero and a disciplinary warning. Such cases underscore how AI detectors can criminalize disability-related differences in expression. 

Institutional Pushback and Ethical Considerations  

In response to mounting evidence of detector unreliability, several major universities have discontinued the use of AI detection tools. Vanderbilt University, Michigan State University, and the University of Texas at Austin have publicly turned off Turnitin’s AI detection feature, citing concerns over accuracy and the risk of harming students through false accusations (Ghaffary, 2023). 

The ethical issues extend beyond technical performance. As the Anthology (2023) white paper emphasizes, AI detectors currently lack the transparency and accountability required for ethical use in educational settings. They can undermine trust, disproportionately impact vulnerable students, and shift the burden of proof onto individuals accused without substantive evidence. 

Recommendations for Institutions 

Suspend Use as Sole Evidence AI detectors should not be used as standalone evidence in academic integrity proceedings. Any flag from a detector must be corroborated with pedagogical, contextual, and student-specific information. 

Train Faculty in Neurodiversity and Linguistic Inclusion Faculty should receive training on how autism and English language learning may affect student writing. Recognizing the diversity in human expression is essential to prevent bias in academic evaluations. 

Educational institutions should prioritize assessment designs that minimize opportunities for dishonesty while valuing individual expression. Oral defenses, project-based work, and iterative feedback are more robust methods of evaluating learning outcomes. 

Monitor and Evaluate Detection Tools Transparently. If AI detectors are used, institutions must disclose their error rates, particularly false positives, and report how these tools impact various student populations. 

Conclusion  

AI writing detectors are, at best, an unreliable and biased tool in the fight against academic dishonesty. Their susceptibility to manipulation, inability to handle neurodivergent or non-native writing styles, and potential to cause unjust harm render them unsuitable as definitive indicators of cheating. Rather than leaning on these flawed technologies, educational institutions should pursue inclusive, equitable, and pedagogically sound methods to uphold academic integrity. 

References 

Anthology. (2023). AI, academic integrity, and authentic assessment: An ethical path forward for education. Anthology Inc. 

Davalos, J., & Yin, L. (2024, October 18). Do AI detectors work? Students face false cheating accusations. Bloomberg Businessweek. https://www.bloomberg.com/ 

Feizi, S., & Huang, F. (2023, May 30). Is AI-generated content detectable? University of Maryland College of Computer, Mathematical, and Natural Sciences. https://cmns.umd.edu/ 

Ghaffary, S. (2023, September 21). Universities rethink using AI writing detectors to vet students’ work. Bloomberg News. https://www.bloomberg.com/ 

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7), 100779. https://doi.org/10.1016/j.patter.2023.100779 

Sadasivan, V., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2023). Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156. https://arxiv.org/abs/2303.11156 

Scroll to Top