Complete Story
 
Re-designing assessment for ethical use of AI

06/08/2026

Re-designing assessment for ethical use of AI

by Jasper Roe and Cory Scurr

 

1) How have the ways you work in assessment design related to generative artificial intelligence (GenAI) changed, or not changed, over the past few years?

Jasper: If we are being honest, I think that even though we’re about 4 years into widespread GenAI models, not a lot has changed. This is because models keep getting better, new capabilities are developed, and technologies such as detectors promise to help, then fail to be useful in the real world. The reality is that redesigning assessments is challenging – especially in contexts that have been running the same types of assessment for decades. So I think that things have not changed a lot if we take a birds-eye view, but they are slowly beginning to now. 

Cory: My approach to assessment design has shifted from focusing on what students produce to how and why they are producing it. Prior to the rise of GenAI, we often relied on final artifacts (essays, reports, case writeups, etc.) as sufficient evidence of learning. That assumption has been fundamentally disrupted. As a result, I’ve shifted toward promoting structural changes in assessment (Corbin et al., 2025). In many ways, the core principles of good assessment design have not changed; rather, the urgency to operationalize them has.

 

2) How have you seen the discourse about GenAI and assessment design evolve, or not, over the past few years?

Jasper: The field is still quite polarised, and there are different takes on what is most important in maintaining the value of university education. Some see a strong need for control and security to assure the quality of qualifications, whereas others are cautious of a detect-and-punish system emerging, especially if it’s based on fallible evidence. I think both of these positions are correct simultaneously, and we should be cautious of binary thinking. I’d like to see the field focus on productive, pragmatic conversations – recognising that there are no easy answers. 

Cory: From my vantage point, the discourse has evolved from an initial focus on detection and prohibition toward a more nuanced conversation about assessment validity and design. Early conversations were dominated by concerns about cheating, with institutions and instructors looking for ways to restrict or monitor AI use (like policy statements, detection tools, or disclosure requirements). Over time, there’s been a growing recognition that these approaches are largely discursive, and they rely on student compliance rather than fundamentally changing how learning is demonstrated.

 

3) What kind of intervention or approach to student use or misuse of GenAI have you used that worked well? Have you tried anything that did not work well?

Jasper: The approach I focus on developing (the AI Assessment Scale – AIAS) has had its fair share of both challenges and benefits. We’ve recently finished up a few empirical research projects on wide-scale implementation of the AIAS in different contexts, and the thing that really jumped out was just how variable the results were. What works well depends on so much – context, resources, relationships, support, and other external factors. 

Cory: What has worked most effectively has been shifting from a focus on policing AI use to creating space for intentional, structured conversations with students about learning outcomes. Of course, this isn’t a panacea for preventing AI use when students shouldn’t use it. When students are brought into that conversation, their use of GenAI becomes less about “getting away with something” and more about making decisions in relation to the purpose of the task. What has been less effective are approaches that rely on top-down directives without conversation. Without grounding expectations in a shared understanding of why the assignment exists and what it is trying to measure, students often interpret them as procedural hurdles, rather than meaningful guidance.

 

4) What has most challenged or frustrated you?

Jasper: The thing that has challenged me the most is the pace of change. It is a bit of a cliché to say this, but it seems as soon as we have a handle on the parameters of what AI can do, and is useful or not useful for in education, a new development happens and we must start again from scratch. I remember that many people (myself included) thought we’d hit a ceiling, or a slowing, very quickly after GPT-4 was released. That doesn’t seem to have happened yet, so we must start to think what the future looks like if things like errors, hallucinations, and context windows continue to improve.

Cory: One of the biggest challenges has been the persistent tendency to frame GenAI primarily as a disciplinary or behavioural problem, rather than an assessment design problem. We often look for quick fixes (detection tools, stricter policies, clearer instructions) when, realistically, the solution requires more complex and time-intensive work, like rethinking learning outcomes, redesigning assessments, and reconsidering what counts as evidence of learning.

 

5) What has given you hope?

Jasper: The sheer volume of effort, interest, and productive debate going on in the field has given me hope. The fact that people feel so impassioned about this subject tells me that they are truly interested, and care deeply about our educational systems. So whatever the situation is, I feel reassured that we have great minds who are committed to this issue.  

Cory: What gives me the most hope is the increasing willingness among educators to rethink long-standing assumptions about assessment. I’ve seen growing interest in things like process-based assessment, authentic and context-rich tasks, evaluative judgment as a core learning outcome, and using GenAI as an object of critique rather than a shortcut to answers. There’s also a growing recognition that the goal isn’t to “win” against AI, but to design learning environments where uncritical reliance on it is insufficient for success. The shift from policing to designing for validity is where I see the most promise.

 

Reference

Corbin, T., Dawson, P. & Liu, D. (15 May 2025): Talk is cheap: why structural assessment changes are needed for a time of GenAI, Assessment & Evaluation in Higher Education https://doi.org/10.1080/02602938.2025.2503964

 


Dr Jasper Roe is Assistant Professor in Education (Digital Literacies and Pedagogies) at Durham University, UK, and researcher of educational technology, artificial intelligence and academic integrity, best known as co-creator of the AI Assessment Scale (AIAS).

Dr Cory Scurr is the Associate Director of Academic Integrity & Student Advising at Conestoga College, Canada, and the Chair of the Academic Integrity Council of Ontario (AICO).

 

The authors' views are their own.

Thank you for being a member of ICAI. Not a member of ICAI yet? Check out the benefits of membership and consider joining us by visiting our membership page. Be part of something great!

 

EDITOR’S NOTE:

This is the second of our blogs is to accompany the ICAI Summer Series of webinars. Jasper and Cory will be delivering the second webinar on June 11 at 12pm EST.

Here is some recommended pre-reading for the webinar:

Required reading:

Scurr, C.D. (2025). Neutralizing the ‘Threat’ of Technology: A practical guide for re-evaluating assessments to maintain academic integrity. Canadian Perspectives on Academic Integrity, 8(4), 1-9, https://journalhosting.ucalgary.ca/index.php/ai/article/view/81748/58599

Perkins, M., Roe, J., & Furze, L. (2025). How (not) to use the AI Assessment Scale. Journal of Applied Learning & Teaching, 8(2), 14-23. https://search.informit.org/doi/abs/10.3316/informit.T2025111100018001255702829

Furze, L., Perkins, M., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS) in action: A pilot implementation of GenAI-supported assessment. Australasian Journal of Educational Technology, 40(4), 38-55. https://ajet.org.au/index.php/AJET/article/view/9434

Recommended further reading(s):

Scurr, C.D. (2025). Assessment Toolkit. Available at: https://structural-guide-to-assessment-redesign.replit.app/

Curtis, G. J. (2025). The two-lane road to hell is paved with good intentions: why an all-or-none approach to generative AI, integrity, and assessment is insupportable. Higher Education Research & Development, 44(8), 2151–2158. https://doi.org/10.1080/07294360.2025.2476516

 

 

Printer-Friendly Version

0 Comments