6.2 - Theme 2. Harnessing Artificial Intelligence, Technology and Digital Innovations in Guideline Development and Implementation
Thursday, September 18, 2025 |
1:35 PM - 2:35 PM |
Speaker
Dr Saphia Mokrane
Département de Médecine Générale, Faculté de Médecine, Université Libre de Bruxelles
Contributing to GRADE Ontology Development: Harmonizing Technology and Clarity
Abstract
Background: The GRADE approach offers a “common, sensible and transparent approach to grading quality (or certainty) of evidence and strength of recommendations” but are GRADE terms used the same way across organizations? Do technological tools such as GRADEpro and MAGICapp result in reporting the same concepts? The GRADE Ontology Working Group is developing a standard vocabulary with thorough consensus and support for use as a technical standard and user-friendly guidance. The GRADE Ontology Working Group meets weekly in open virtual meetings and contributors may comment or vote on terms at any time at https://fevir.net/gradeontology. The process curates terms from published GRADE guidance and establishes preferred and alternative terms for key concepts, definitions, and guidance for application. Deliberation continues until a globally agreed approach is reached across a diverse community.
Objective: To provide the immersive experience of participating in GRADE Ontology consensus development.
Format (including interactive elements) for workshops
Conceptual introduction to the GRADE Ontology (10 minutes)
Introduction to the FEvIR Platform as needed for participation, technical support for signing up to participate in the workshop (10 minutes)
Presentation of a term that is open for voting (10 minutes)
Anonymous voting and comments by audience participants, with support from facilitators as needed (5 minutes)
Review of de-identified voting results and feedback comments, then open consensus-development discussion to address all comments (40 minutes)
Discussion, feedback, and questions and answers about the process experienced by the participants (10 minutes)
Review of how to participate after the workshop (5 minutes)
Objective: To provide the immersive experience of participating in GRADE Ontology consensus development.
Format (including interactive elements) for workshops
Conceptual introduction to the GRADE Ontology (10 minutes)
Introduction to the FEvIR Platform as needed for participation, technical support for signing up to participate in the workshop (10 minutes)
Presentation of a term that is open for voting (10 minutes)
Anonymous voting and comments by audience participants, with support from facilitators as needed (5 minutes)
Review of de-identified voting results and feedback comments, then open consensus-development discussion to address all comments (40 minutes)
Discussion, feedback, and questions and answers about the process experienced by the participants (10 minutes)
Review of how to participate after the workshop (5 minutes)
Paper Number
316
Biography
Brian Alper is project lead for EBMonFHIR, GINTech Chair, CEO of Computable Publishing LLC, and has founded systems for decision support and guideline adaptation. Joanne Dehnbostel is research and analysis manager for Computable Publishing LLC and project administrator for Health Evidence Knowledge Accelerator. Brian and Joanne have extensive experience with the development of the FEvIR Platform.
Saphia Mokrane is general practitioner (GP), guideline developer for Belgian primary health care providers and teaching master at the University of Brussels, providing methodological support for medical students and GPs in training for their master thesis.
Dr Saphia Mokrane
Département de Médecine Générale, Faculté de Médecine, Université Libre de Bruxelles
The GRADE Ontology Project
Abstract
Background: GRADE is a methodological framework for certainty-of-evidence and evidence-to-decision judgements. Despite several guidance papers, there is still a heterogeneous understanding of the key terms, resulting in a reduced computability of evidence summaries. Computer applications (e.g. GRADEPro, MAGICapp) require harmonised and well-defined terms. Formalising the definition of terms in GRADE into a harmonised ontology will provide a technical infrastructure that will enhance the efficiency of creating and reusing GRADE data. Standard terminology can be used in other systems, such as the Medical Subject Headings (MeSH) having a term for the GRADE approach.
Objective: To develop and maintain a standardized, computable vocabulary of terms, called GRADE Ontology.
Methods: Using the 13-step term definition protocol developed for the Scientific Evidence Code System (SEVCO), a group of GRADE experts meet weekly in virtual meetings to curate terms from published GRADE guidances, establish preferred and alternative terms for included concepts, define concepts, and write guidance for application. Any dissenting comments or votes are discussed until agreement is reached.
Results: The first version of the GRADE Ontology consists of 77 terms covering certainty-of-evidence and evidence-to-decision judgements. Fifty-five experts from multiple countries have participated. Twenty-two terms have reached 100% agreement with an average of 4.6 voting rounds, and 50.0 total votes per term, with an average of 10.4 votes per final round that resulted in 100% agreement.
Discussion for scientific abstracts: Clarifying key terms following the GRADE approach, the GRADE Ontology will lead to coherence across GRADE reports and their computability, facilitating dissemination and equal understanding.
Objective: To develop and maintain a standardized, computable vocabulary of terms, called GRADE Ontology.
Methods: Using the 13-step term definition protocol developed for the Scientific Evidence Code System (SEVCO), a group of GRADE experts meet weekly in virtual meetings to curate terms from published GRADE guidances, establish preferred and alternative terms for included concepts, define concepts, and write guidance for application. Any dissenting comments or votes are discussed until agreement is reached.
Results: The first version of the GRADE Ontology consists of 77 terms covering certainty-of-evidence and evidence-to-decision judgements. Fifty-five experts from multiple countries have participated. Twenty-two terms have reached 100% agreement with an average of 4.6 voting rounds, and 50.0 total votes per term, with an average of 10.4 votes per final round that resulted in 100% agreement.
Discussion for scientific abstracts: Clarifying key terms following the GRADE approach, the GRADE Ontology will lead to coherence across GRADE reports and their computability, facilitating dissemination and equal understanding.
Paper Number
405
Biography
Joanne Dehnbostel is research and analysis manager for Computable Publishing LLC and project administrator for Health Evidence Knowledge Accelerator. Joanne has extensive experience with the development of the FEvIR Platform.
Saphia Mokrane is general practitioner, guideline developer for Belgian primary health care providers and teaching master at the University of Brussels, providing methodological support for medical students and general practitioners in training for their master thesis.
Dr Prashanti Eachempati
Indian
Consultant Senior Researcher
MAGIC Evidence Ecosystem Foundation
Justifying Uncertainty Decisions for GRADEing Evidence: The JUDGE-AI Approach for Automating Transparent and Consistent Certainty Assessments
Abstract
Background:
The GRADE approach guides systematic reviews, HTAs, and guideline development, but ensuring transparency and consistency in certainty assessments remains challenging. We developed the JUDGE tool, a structured checklist to standardize certainty ratings in randomized trials. To improve efficiency, we explored AI’s role in automating GRADE evaluations using the JUDGE checklist.
Objectives:
To develop and test the JUDGE tool for improved GRADE application, focusing on AI’s potential to enhance efficiency, consistency, and transparency in certainty assessments.
Methods:
We developed JUDGE based on expert input and GRADE literature, refining it through iterative feedback. The tool underwent multiple revisions to improve clarity, usability, and alignment with GRADE principles. To explore AI applications, we tested three large language models—ChatGPT, Claude, and DeepSeek—by prompting them to apply the JUDGE checklist in certainty assessments across systematic reviews. Each model evaluated risk of bias, inconsistency, indirectness, imprecision, and publication bias. We compared AI-generated ratings to expert assessments for consistency, justification, and reproducibility. Response stability was assessed by repeating prompts.
Results:
AI models demonstrated structured reasoning. ChatGPT provided detailed justifications. Claude overlooked key contextual factors. All models produced variable assessments when provided with the same information regarding the evidence and the same prompts. None fully aligned with expert ratings.
Conclusion:
While AI shows promise in structuring certainty assessments using JUDGE, variability in responses and suboptimal judgements remain. Future work will focus on optimizing AI for GRADE decision-making and integrating JUDGE-based AI support into GRADE authoring platforms.
The GRADE approach guides systematic reviews, HTAs, and guideline development, but ensuring transparency and consistency in certainty assessments remains challenging. We developed the JUDGE tool, a structured checklist to standardize certainty ratings in randomized trials. To improve efficiency, we explored AI’s role in automating GRADE evaluations using the JUDGE checklist.
Objectives:
To develop and test the JUDGE tool for improved GRADE application, focusing on AI’s potential to enhance efficiency, consistency, and transparency in certainty assessments.
Methods:
We developed JUDGE based on expert input and GRADE literature, refining it through iterative feedback. The tool underwent multiple revisions to improve clarity, usability, and alignment with GRADE principles. To explore AI applications, we tested three large language models—ChatGPT, Claude, and DeepSeek—by prompting them to apply the JUDGE checklist in certainty assessments across systematic reviews. Each model evaluated risk of bias, inconsistency, indirectness, imprecision, and publication bias. We compared AI-generated ratings to expert assessments for consistency, justification, and reproducibility. Response stability was assessed by repeating prompts.
Results:
AI models demonstrated structured reasoning. ChatGPT provided detailed justifications. Claude overlooked key contextual factors. All models produced variable assessments when provided with the same information regarding the evidence and the same prompts. None fully aligned with expert ratings.
Conclusion:
While AI shows promise in structuring certainty assessments using JUDGE, variability in responses and suboptimal judgements remain. Future work will focus on optimizing AI for GRADE decision-making and integrating JUDGE-based AI support into GRADE authoring platforms.
Paper Number
326
Biography
Prashanti Eachempati is a Consultant Senior Researcher with the MAGIC Evidence Ecosystem, specializing in Cochrane systematic reviews and GRADE assessments. She is a Senior Researcher from the MAGIC side, contributing to the MARC-SE consortium, which focuses on antimalarial drug resistance in Southeast Africa. Her work aims to enhance the transparency and systematic application of GRADE assessments. She is collaborating with Gordon Guyatt to develop a novel tool that improves the clarity and consistency of evidence evaluations. Her expertise spans evidence synthesis, methodological innovations, and decision-making frameworks.
Prof Holger Schünemann
Humanitas University
Advancing GRADE: Updates to the GRADE Handbook and a New Mapping Approach for Accessing Up-to-Date GRADE Guidance
Abstract
Background: The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach is used by over 100 organizations in evidence-based decision-making. However, users often face challenges in locating the most up-to-date guidance across multiple GRADE sources. To enhance clarity and accessibility, the GRADE recently released is official and free online artificial intelligence-supported resource: GRADE Book.
Objectives: This presentation provides the latest updates on GRADE as well as its core basics and introduces a novel mapping approach through the book interface, incorporating artificial intelligence (AI), to streamline access to GRADE guidance.
Methods: GRADE systematically reviews and integrates methodological advancements into the GRADE Book. A structured mapping system categorizes and cross-link guidance from the Book, published articles, to the online resource GRADEpro. Additionally, AI-driven but validated querying capabilities are being integrated to facilitate efficient navigation and retrieval of relevant guidance.
Results: Updates to GRADE include refinements in rating the certainty, evidence-to-decision frameworks, considerations for multiple interventions, and improved approaches considering decision thresholds. The mapping approach, developed through careful consultation, allows users to locate guidance by topic, methodological category, or publication source and lists only up to date resources. AI-powered search functionality further enhances usability by enabling precise and context-aware queries.
Conclusions: This official GRADE resource, combined with the new mapping approach and AI-assisted querying, provides the coherent and accessible pathway for applying GRADE methods, from basic to advanced. These enhancements should improve the uptake and consistency of GRADE methods in guideline development, systematic reviews, and health policy decision-making.
Objectives: This presentation provides the latest updates on GRADE as well as its core basics and introduces a novel mapping approach through the book interface, incorporating artificial intelligence (AI), to streamline access to GRADE guidance.
Methods: GRADE systematically reviews and integrates methodological advancements into the GRADE Book. A structured mapping system categorizes and cross-link guidance from the Book, published articles, to the online resource GRADEpro. Additionally, AI-driven but validated querying capabilities are being integrated to facilitate efficient navigation and retrieval of relevant guidance.
Results: Updates to GRADE include refinements in rating the certainty, evidence-to-decision frameworks, considerations for multiple interventions, and improved approaches considering decision thresholds. The mapping approach, developed through careful consultation, allows users to locate guidance by topic, methodological category, or publication source and lists only up to date resources. AI-powered search functionality further enhances usability by enabling precise and context-aware queries.
Conclusions: This official GRADE resource, combined with the new mapping approach and AI-assisted querying, provides the coherent and accessible pathway for applying GRADE methods, from basic to advanced. These enhancements should improve the uptake and consistency of GRADE methods in guideline development, systematic reviews, and health policy decision-making.
Paper Number
414
Biography
Holger J. Schünemann is chair of the GIN Board of Trustees and a tenured professor at Humanitas University in Milan, Italy, where he is responsible for internationalization. He trained in respiratory and exercise physiology, lung biology, epidemiology, internal medicine and preventive medicine/public health.
Since 2000, he helped reshaping of methodology for guideline development spanning clinical medicine to public health and contributed methodologically and practically to knowledge synthesis research, foremost through his co-leadership of the GRADE working group (www.gradeworkinggroup.org) that he co-chairs and developing the INGUIDE program (INGUIDE.org). He is an author of over 900 peer-reviewed publications.
