Advancements in artificial intelligence (AI) are rapidly transforming the clinical trial industry, leading IRBs to increasingly review projects that involve AI. In response, WCG and the Multi-Regional Clinical Trials Center of Brigham and Women’s Hospital and Harvard (MRCT Center) developed a framework to provide tools and resources to IRBs and investigators to facilitate efficient and thorough ethical review of AI-related projects.
In the recent webinar, “Framework for AI Adoption in Clinical Trials: A Case Study Perspective,” WCG, in collaboration with the MRCT Center, explored a real-world case study and introduced the Framework for Review of Clinical Research Involving AI, exploring questions, challenges, and solutions surrounding AI implementation in clinical research.
During the session, panelists used a case example to illustrate how a project involving AI may progress from the discovery stage to deployment. They discussed the unique considerations that IRBs and investigators face at each phase of this process, emphasizing how the framework offers critical guidance and tools to support effective review and oversight of AI-driven projects in clinical research. The case, presentations, and discussion can be found here. For convenience, we summarize the case here:
Case Example Summary:
- A research team is developing and testing a large language model-powered chatbot app to support adults with mild to moderate depression and anxiety through Cognitive Behavioral Therapy-based (CBT) interactions and mood tracking.
- The discovery stage involves training the chatbot using recordings from CBT sessions, without real patient involvement.
- The translation stage validates the chatbot’s responses using scripted interactions evaluated by clinicians.
- In the deployment stage, qualified personnel screens participants for suicidality, and clearly communicates that the chatbot is not a replacement for clinical care, incorporating crisis language detection and helpline referrals.
- Key concerns include the absence of clinician oversight, the potential for inappropriate responses, data privacy, and the need for transparent informed consent regarding the chatbot’s limitations.
In this blog post, the panelists continue the conversation by addressing key questions submitted by webinar participants. The audience questions are presented here exactly as they were received. Read on for insightful answers from experts, shedding further light on applying the framework to projects undergoing IRB review.
AI Safety, Ethics, and Oversight
Will the AI chatbot used in the example be trained to challenge the patient instead of agreeing with the patient?
While this specific case did not specify the tone of the chatbot responses, optimally, the chatbot would be fair and accurate, neither recommending nor dissuading participation. One of the stated concerns is that AI may provide erroneous information. This concern can be mitigated by appropriate training and validation. It is important to ensure that the data used to train the AI algorithm is of good quality and inclusive of specific information relevant to the protocol. It is also important that the AI algorithm is appropriately validated before it is used with a research participant (or a patient). The responsible IRB should be comfortable with the information provided and the simulated conversation before it approves a chatbot for use.
Would it be considered an ethical concern if monitoring activities were performed by a validated AI tool in a research proposal?
It is highly likely that AI is being used or will be used to monitor research and, parenthetically, in other aspects of the administration of research. Concerns are minimized if the AI tool is validated, and the methods for validation should be reviewed. At this stage in AI deployment for monitoring, however, we recommend that human oversight is appropriately incorporated in the monitoring process. Ethical concerns are mitigated by validation and human oversight.
I am naïve about AI, but why and who is pushing for AI interaction within human research? What do sponsors and health care providers really hope to accomplish by using AI, especially given a recent story about how AI prompted an individual to commit suicide? Is AI truly needed, as it will likely eliminate human interaction?
This is a good question, and one that many people are asking. In our collective view, AI should complement human interaction as appropriate and only replace human interaction when it is demonstrated to be preferred. AI can perform many functions more quickly and accurately than humans, and its use can be limited or expanded to align with its own ever-improving advances and development. We know that human beings do have expertise and empathy, but we also make mistakes, get sick, need to sleep, and can only access a limited amount of information. AI will, of course, also have limitations. Importantly, there is no structure or guidance to assess the risks and benefits of using AI instead of human interactions at this time.
AI in Protocol and Study Design
Do you think we should use the technology background in the protocol? How does the technology interpret the data?
We assume this question is related to how to evaluate information on the technology included in the protocol. This may be a place to start, but the reviewers may want more information on the data used to train and inform the algorithm. We suggest using the questions in the framework to determine if more information is needed.
Does the AI in an early cancer detection device, still in development and not yet in use in humans, need to be disclosed in the informed consent form (ICF), even though its potential to improve reliability is unknown?
More information is needed to answer this question comprehensively. First, if the AI algorithm is not approved by the appropriate regulatory body for use, the results should not be relied upon. Any evaluation of the reliability of the AI algorithm should be a research endpoint, not used to direct clinical decision-making, and there should be human oversight of the output and where and how the data is shared. The IRB reviewing the protocol should ensure that the limitations of use are clear. Second, if patient data is being used to train or validate the algorithm, the research may qualify for a waiver of consent. If the research does not qualify for a waiver of consent, the participants should be informed and have the option to opt out of participation.
In a true study, wouldn’t a human conducting CBT rather than a chatbot be able to act (e.g., police intervention) in cases of suicidal threat or ideation? The AI chatbot only offering options does not seem adequate.
Yes, you are correct. Reliance on the chatbot alone will place the responsibility for reaching out to the investigator on the participant. The IRB would need to decide whether additional actions are necessary to mitigate risk given the specifics of the research question, the population recruited, and other factors. The IRB would want to ask how the risk is intended to be mitigated and determine whether those strategies are sufficient. One mitigation strategy would be to notify members of the study team in real time so they can independently assess the situation and follow up with the participant as appropriate.
AI Validation, Data Management, and Compliance
In the validation stage, are clinicians who evaluate chatbot responses considered human subjects?
While in many cases the clinicians would be considered participants in the research, in certain circumstances, study teams may propose “expert raters” about whom no identifiable information is retained and who would not then serve as participants or human subjects in the research. The IRB would need to determine how the clinicians are engaged and what information about them is collected, retained, and used. The IRB will then be able to determine whether the clinicians are human subjects and whether and how consent should be obtained.
What training sources are available for algorithms?
The training sources for the algorithm are specific to the intended use of the AI algorithm and will vary. Depending on the state of AI development and the research objectives, the IRB may want to ask about how the training data was acquired, whether identifiable data is being used (and its source), whether the data is representative of the intended population likely to use the AI product or device, and other relevant questions.
How would you address the unpredictability of AI models when encountering unfamiliar trial data?
The quality, representativeness, and amount of data used to train an AI model are important. It is unlikely that the IRB will be reviewing the actual data, but the IRB might question what data sources are or were used to inform the model and whether the data are appropriate for the proposed use of the AI in the research itself. We suggest using the questions under the “Data Sources and Collection” section of the framework for reference.
Who owns the data and who can access it? What are the checks for nefarious submissions?
These are important questions to ask when developing an AI chatbot. The answers will depend on the specific project. Unfortunately, this case example did not include sufficient details to permit us to answer.
How do you manage, interpret, or report if you observe something unexpected?
This is an important question for the IRB to ask during its review. The principal investigator should consider reasonably foreseeable risks and unexpected information and propose mitigation strategies. These would include how the unexpected events will be identified and evaluated, who will be informed, and how the event will be treated or resolved. The principal investigator or a designee should be knowledgeable and able to describe the range of expected — and unexpected — behaviors. This unexpected event, particularly if potentially harmful, should be identified during the study and in real time, and a timely response initiated. Like any adverse event that is serious and unanticipated, the event should be reported to the IRB in a timely fashion.
It is the principal investigator’s responsibility to ensure that someone on the research team, if not the principal investigator, is knowledgeable about the AI algorithm. Reciprocally, it is the responsibility of the IRB to ensure that it has appropriate expertise,or access to that expertise, to evaluate the risks of the algorithm and the sophistication and completeness of the written protocol.
What compliance issues might a site face when using AI for database recruitment?
Possible compliance issues could include breaches of confidentiality and HIPAA violations. Recruitment activities are the start of human subjects research, so sites should be sure to provide their recruitment practices for review by their IRB. Additionally, until there is sufficient validation of AI algorithm efficacy, human oversight would be appropriate.
Frameworks, Templates, and Regulatory Review
Do you have a ballpark estimate of how many sponsors/investigators are implementing the MRCT Center’s framework compared to those with AI studies not using it?
Since the framework is only recently available on the MRCT Center website for use by the general public, we do not have that information.
Do you have a protocol and ICF template for retrospective clinical data we can share with a principal investigator developing a study using AI?
We recommend checking with the reviewing IRB for the proposed project to determine whether they have specific templates or guidance for submitting protocols involving AI.
For “site-made” administrative AI tools, is IRB review needed (e.g., QR code reader updating compliance logs)?
If a device is commercially available and used within its labeled indication in the administration of research, there would be no need for the IRB review.
AI Functionality, Exclusion Criteria, and Real-World Application
Do requirements change for a chatbot that does not recommend actions but only analyzes data for professional evaluation?
In the scenario you describe, there are no actual human subjects or reasonably identifiable data, so the review considerations are likely to be similar to those raised during the discussion of the “translation” phase during the webinar, and no IRB review would be required.
How would investigators be alerted if a previously ‘stable’ subject now needs immediate attention? My concern is about missing something important.
As mentioned earlier, the answer depends on the state of the AI algorithm’s development (and approval status) and will change with the specifics of the research question, the population recruited, and other factors. Using the chatbot will place the responsibility for reaching out to the investigator on the participant unless appropriate measures to mitigate risk are introduced. The IRB should review the proposal with those considerations in mind and decide whether additional actions are necessary.
Should less restrictive exclusion criteria be considered during trial stages, since it is hard to exclude certain users in real-world AI tool use?
This is important to consider when evaluating whether an AI chatbot therapy application is ready for validation and deployment. The objectives, endpoints, and outcomes of the research should anticipate the areas that the regulatory authorities will wish to review to determine safety and efficacy.
Explore More
Watch the webinar recording for more review considerations highlighted across the different AI development stages.
Read the full framework, Framework for Review of Clinical Research Involving AI, a groundbreaking resource co-developed by a diverse, multi-stakeholder task force of experts in bioethics, AI, clinical research, and regulatory affairs.
Contributors
Don't trust your study to just anyone.
And we’re the best for a reason. Experience the WCG difference starting with a free ethical review consultation. We’re here to help you streamline, alleviate, and accelerate.