Aditi Jha | 2026 I.S. Symposium

狈补尘别:听Aditi Jha
Title: Detecting AI-Generated Text: An Interpretable Multi-Pipeline Analysis Across Stylometric, TF-IDF, and Transformer Representations
惭补箩辞谤:听Computer Science
Minor: Statistical and Data Sciences; Mathematics
Advisor: John Musgrave
The rapid rise of tools like ChatGPT and other large language models has made it increasingly difficult to distinguish between human-written and AI-generated text. This project explores a central question: what actually makes human writing different from machine-generated writing, and how can we reliably detect that difference? To answer this, I developed and compared multiple approaches to text detection, ranging from traditional machine learning models using writing style features, to modern deep learning models such as RoBERTa. Using a curated dataset of 10,000 texts across essays, stories, and question-answering tasks, I evaluated how well each method could classify text as human or AI-generated. While transformer-based models achieved the highest accuracy (over 92%), simpler models based on word patterns and writing style performed moderately well but struggled to clearly separate the two categories. What excites me most about this work is not just improving accuracy, but understanding why models make their decisions. By applying interpretability techniques, I found that many traditional features capture only surface-level patterns, while deeper models rely on more subtle semantic and structural cues. This shifts the focus from detection as a black-box problem to one grounded in explainable evidence. This research highlights that effective AI-text detection requires both strong models and transparent reasoning. Future work could extend this analysis to multilingual data, newer AI models, and real-world applications such as education and misinformation detection, helping build more trustworthy and accountable AI systems.
Posted in Symposium 2026 on May 1, 2026.