February 11, 2026

The AI Adherence Trap: Better Data, Not Fewer Humans

A blog on: The adherence data quality problem, The risks of AI-driven coaching, What responsible AI actually looks like
The AI Adherence Trap: Better Data, Not Fewer Humans

In clinical development and research, the integration of machine learning and custom large language models, or let’s be honest, companies building wrappers around Claude's API, has moved from concept to infrastructure; maybe faster than it should have. Companies are developing, pitching and hyping AI systems for risk-based monitoring, protocol deviation detection, dropout prediction and entire trial documentation creation. Several major clinical data platforms now incorporate advanced analytics to identify anomalous patterns in trial data, flag site-level risks and prioritise monitoring activity. However, if non-deterministic generative AI has become part of the infrastructure running trials, has it been built on solid ground?                              

To be clear, this isn't an anti-AI argument. At Pill Connect, LLMs have greatly improved several aspects of our workflows. This is a discussion about how responsible, impactful AI starts with good data and a respect for what's happening under the hood. Our focus here is adherence monitoring, a fundamentally human aspect of trial design that requires more nuance than most AI solutions acknowledge.

A lot of the talk around AI in clinical trials focuses on algorithms, dashboards and predictive modelling. It seems that less attention is paid to the quality of the data feeding those systems. Machine learning models find patterns in data. If the data is invalid, so are the insights [1][2]. Historically, trial teams acknowledged that adherence data from pill counts and self-reports was unreliable, so they used it cautiously. But now, AI systems are being trained on this same unreliable data and used to drive real-time interventions and predictive models. Unfortunately, garbage in, garbage out. Sticking with adherence data: If exposure data is assumed rather than verified, if adherence is overestimated by pill counts or self-report, or if dose timing variability is invisible, those distortions flow straight into the models.  Models amplify signal, but they also amplify noise. Therefore, it’s critical that data is well labelled and representative of what you actually need to measure, whilst free of generalisation and bias.    

As these systems become more integrated into trial analytics, adherence data (when did the participant interact with the drug), becomes a core input variable. Exposure-response modelling, subgroup detection, safety signal interpretation and adaptive trial decisions all depend on an accurate understanding of what participants actually took, and when. Whether that data is used to train predictive models or fed into LLM prompts for real-time intervention, if exposure data is assumed rather than verified, the insights will be flawed. An AI system may misattribute non-response to pharmacology rather than behaviour, or conclude that a drug doesn't work in certain patients when the real issue is that those patients weren't taking it consistently. In that sense, adherence is no longer merely a behavioural challenge, it becomes a data integrity issue with algorithmic consequences.    

This has led to growing interest in whether AI systems themselves can be used to manage adherence.    

You can see the appeal. Systems that analyse dosing patterns in real time and deploy conversational AI to nudge participants who appear to be drifting. An AI assistant detects three missed doses in a week and sends a gentle prompt. It spots erratic timing and suggests ways to build better habits. On the surface, it seems like an obvious way to close the loop.    

This is where it gets very messy. In regulated clinical trials, AI-driven behavioural coaching isn't just a tech upgrade, it likely constitutes an intervention. An automated intervention, making real-time decisions about participant care without a clinician in the loop. If an algorithm nudges some participants more than others, you've introduced bias. If it changes behaviour in ways not outlined in the protocol, you've compromised endpoint integrity. For instance, an AI detecting inconsistent timing might suggest a participant switch from evening to morning dosing. Seemingly helpful, but now you've changed exposure patterns in ways that could affect pharmacokinetics and weren't specified in the protocol. If something goes wrong, who's accountable? The sponsor? The site? The vendor? Regulators are rightly cautious about systems that autonomously influence participant behaviour without human oversight.    

There is another problem: large language models are inherently non-deterministic[3,4,5]. The same input can produce different outputs across runs. While systems can be constrained and monitored, outputs cannot be fully pre-specified in the way traditional rule-based software can. That makes prospective validation challenging. It becomes difficult to define, in advance, exactly what participants may be told or to map out a consistent participant journey when the system is generating responses dynamically in real time.    

Beyond this, reasons for non-adherence are so widely varied that interpreting them from a set of rigid inputs, without actually talking to the participant, is going to be very difficult. Furthermore, there are well known issues of hallucinations and sycophancy[6][7]    

There are further practical human factors to consider. AI chatbots and digital coaching tools may not resonate equally across demographics, therapeutic areas or cultural contexts. Even when they do, sustained engagement is far from guaranteed[8]. The mHealth literature is full of digital interventions that start strong and fade fast. Designing systems that depend on frequent participant interaction introduces behavioural fragility into what should be a robust data capture process.    

There is a potential view that for adherence monitoring, AI is overkill. Traditional statistical methods, such as control charts, threshold alerts, simple deviation detection, would be more transparent, deterministic, and reliable if the underlying data were accurate. The rush to deploy machine learning often serves to obscure data quality problems rather than solve them. A fancy algorithm analysing garbage data doesn't produce better insights than a simple calculation on garbage data. It just makes the garbage harder to see.    

So how do we harness one of the most powerful technological advances in recent memory without creating regulatory chaos or fragile systems? The answer may not be direct coaching, but intelligent detection combined with human oversight. Instead of replacing clinical judgment, AI can augment it. Models could analyse large amounts of objective, timestamped dosing data to quickly identify emerging patterns of risk: increasing timing variability; extended sync gaps; clusters of missed doses. These can then be flagged to site staff. Human professionals remain in the loop, deciding whether and how to intervene. This preserves regulatory clarity while still benefiting from predictive analytics.    

In this model, AI becomes an analytical layer rather than a behavioural authority. It's a co-pilot, not the pilot. It enhances visibility, not control. It supports early detection of exposure-related risks without autonomously modifying participant behaviour. Crucially, the effectiveness of such systems depends entirely on the quality and reliability of the underlying data stream. Objective, low-burden, longitudinal adherence capture provides a far more stable foundation for analytics than self-reported or sporadically verified measures.    

So, what does good AI use in clinical trials actually look like? It's not about replacing human decision-making or building fancier chatbots. It's about ensuring that models are trained on accurate, well-labelled, high-fidelity data. It's about recognizing that how you capture data matters as much as how you analyse it. And it's about maintaining appropriate oversight, transparency and regulatory alignment as these systems evolve.    

As AI reshapes clinical development, success is unlikely to come from pushing humans out of the loop. It will come from ensuring the underlying algorithms have something worth analysing. In the context of adherence, the future isn't just smarter analytics, it's smarter, lower-burden data capture that keeps humans in control while giving AI something real to work with.    

[1] FDA (2025). Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products. Draft Guidance, January 2025.    

[2] Anthropic (2024). Avoiding Hallucinations — Prompt Engineering Tutorial. URL: https://github.com/anthropics/courses/blob/master/prompt_engineering_interactive_tutorial/Anthropic%201P/08_Avoiding_Hallucinations.ipynb    

[3]Baldwin, A., et al. (2025). Non-Determinism of "Deterministic" LLM Settings. https://arxiv.org/abs/2408.04667    

[4]Nazer LH, Zatarah R, Waldrip S, et al. "Bias in artificial intelligence algorithms and recommendations for mitigation." PLoS Digit Health. 2023;2(6):e0000278.    

[5]Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. "Ethical Machine Learning in Healthcare." Annu Rev Biomed Data Sci. 2021;4:123-144.    

[6]Sharma, M., et al. (2024). Towards Understanding Sycophancy in Language Models. Published at ICLR 2024.    

[7]Sharma, M., et al. (2026). Who's in Charge? Disempowerment Patterns in Real-World LLM Usage. Anthropic & University of Toronto.    

[8]Pratap A, Neto EC, Snyder P, et al. "Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants." NPJ Digit Med. 2020;3:21.