Every faculty member has had the same uneasy moment: an assignment comes back clean, fluent, and oddly hollow, and there is no way to prove how it was written.
Let’s start with the honest part. You cannot make an assignment AI-proof. Any vendor who promises a cheat-proof assessment is selling something that does not exist, because a determined student with a capable model and enough time will find a path. So the useful question is not how to catch AI use. It is how to design assessment so that faking it takes more effort than learning, and so the attempt teaches something even when a student tries to game it.
The concern behind that question is not abstract. In College Board's 2025 faculty research, 92% of faculty said they are concerned about plagiarism or dishonesty facilitated by AI, and more than 84% said it weakens students' critical thinking and original work. The student side moves just as fast: in the UK, HEPI found that student AI use jumped from 66% to 92% in a single year, and 88% of undergraduates have now used generative AI on assessments. Whatever the exact number on your campus, the tools are already in every student's hands.
The instinct is to fight that with detection. That instinct is the trap.
Detection Is a Losing Game
AI detectors promise to tell you whether a model wrote something. They cannot do it reliably, and the people closest to the models know it. When OpenAI tested its own AI text classifier, it correctly flagged only 26% of AI-written text, and it pulled the tool in 2023, citing a low rate of accuracy.
The failures are not random. They land hardest on the students least able to defend themselves. A Stanford study found that detectors misclassified more than 61% of essays written by non-native English speakers as AI-generated, at false-positive rates several times higher than for native writers, because the plainer, lower-perplexity writing of an ESL student reads to a detector like a machine. A tool that turns "your English is too simple" into "you cheated" is not an integrity tool. It is a liability.
Even a perfect detector would solve the wrong problem. It would tell you a model was involved. It would not tell you whether the student learned anything. Integrity is not the absence of AI. It is evidence that a person did the thinking.
Design Assessments So Faking Costs More Than Learning
This is where assessment design does what detection cannot. A one-shot essay or a fixed problem set is trivially easy to outsource: paste the prompt, copy the answer, submit. The format invites the shortcut.
Interactive Learning Experiences change the format. An ILE is not a single prompt with a single answer. It is a multi-turn, Socratic exchange that adapts to what the student demonstrates, asking follow-ups that build on the last answer and pressing on the places a student stays vague.
That shift matters for integrity in a concrete way. A student can still try to route the conversation through an outside model. But to keep up, they have to paste each adaptive turn back and forth, feed the model the running context, and answer follow-up questions shaped by their own previous replies. By the time they have shuttled a real Socratic exchange through another tool, the shortcut is no longer frictionless: they have to follow the reasoning, manage the context, and stay with a conversation that keeps moving. Engagement that involved is hard to fake without absorbing some of the material along the way. The design does not make cheating impossible. It raises the cost of faking it, and it makes a gamed attempt hard to finish without some of the learning leaking in.
Faculty also see what a finished essay hides. Because faculty and instructional designers set the learning objectives, evaluation criteria, and mastery thresholds for each ILE, the instructor can see how a student moved through the conversation: where they hesitated, where they recovered, and what they actually demonstrated. A hollow performance looks different from genuine understanding, and that difference is finally visible.
The boundary is real. This does not eliminate misconduct, because no assessment design can. What it changes is what the system rewards: instead of spending faculty energy trying to prove misconduct after the fact, it moves that energy into assessment structures that make genuine engagement easier to see.
Find the Assessments AI Can Already Beat
You cannot redesign what you cannot see. Most catalogs are full of assessments written before generative AI existed, and no one has time to re-examine all of them by hand.
The Course Modernizer scores existing courses for AI vulnerability as part of its gap analysis, flagging how easily a graded assessment could be completed by a generative model. That gives faculty and instructional designers a triage list: the assessments most exposed to AI, ranked, so the riskiest ones get redesigned first instead of every assessment getting policed after the fact. It turns "everything is vulnerable" into "here are the handful that need attention first."
Faculty Decide What Counts as Learning
Designing for integrity is a governance posture before it is a product feature. The judgment about what a student must demonstrate, what evidence counts, and where the line sits between help and substitution belongs to the faculty member or instructional designer who owns the course, not to a model and not to a detector.
That is the same principle that runs through how Axio keeps academic judgment with faculty across the platform: the domain expert sets the objectives and the standards, reviews and overrides AI-assisted scores, and approves what reaches students. Academic integrity by design is that principle applied to assessment. Faculty decide what counts as learning, and the assessment is built so that meeting that standard is easier than faking it.
Design, Don't Police
You cannot detect your way out of AI cheating, and you cannot forbid your way out either. Both are arms races you will lose, and both spend faculty trust to do it.
What you can do is design. Build assessments where the path of least resistance is the learning itself, where faking the work costs about what doing it would, where even a gamed attempt teaches something, and where faculty can see what actually happened. That is not a promise that no one will ever cheat. It is a more honest and more durable answer than pretending a detector will save you.
See how Axio helps institutions design AI-resilient assessments with faculty-defined criteria, adaptive evidence of learning, and faculty override on every AI-assisted score, or compare the AI-Native and AI-Augmented tracks to see how it fits your institution.



