Article Text
Abstract
Introduction Cognitive–behavioural therapy (CBT) works—but not equally well for all patients. Less than 50% of patients with internalising disorders achieve clinically meaningful improvement, with negative consequences for patients and healthcare systems. The research unit (RU) 5187 seeks to improve this situation by an in-depth investigation of the phenomenon of treatment non-response (TNR) to CBT. We aim to identify bio-behavioural signatures associated with TNR, develop predictive models applicable to individual patients and enhance the utility of predictive analytics by collecting a naturalistic cohort with high ecological validity for the outpatient sector.
Methods and analysis The RU is composed of nine subprojects (SPs), spanning from clinical, machine learning and neuroimaging science and service projects to particular research questions on psychological, electrophysiological/autonomic, digital and neural signatures of TNR. The clinical study SP 1 comprises a four-centre, prospective-longitudinal observational trial where we recruit a cohort of 585 patients with a wide range of internalising disorders (specific phobia, social anxiety disorder, panic disorder, agoraphobia, generalised anxiety disorder, obsessive–compulsive disorder, post-traumatic stress disorder, and unipolar depressive disorders) using minimal exclusion criteria. Our experimental focus lies on emotion (dys)-regulation as a putative key mechanism of CBT and TNR. We use state-of-the-art machine learning methods to achieve single-patient predictions, incorporating pretrained convolutional neural networks for high-dimensional neuroimaging data and multiple kernel learning to integrate information from various modalities. The RU aims to advance precision psychotherapy by identifying emotion regulation-based biobehavioural markers of TNR, setting up a multilevel assessment for optimal predictors and using an ecologically valid sample to apply findings in diverse clinical settings, thereby addressing the needs of vulnerable patients.
Ethics and dissemination The study has received ethical approval from the Institutional Ethics Committee of the Department of Psychology at Humboldt-Universität zu Berlin (approval no. 2021-01) and the Ethics Committee of Charité-Universitätsmedizin Berlin (approval no. EA1/186/22).
Results will be disseminated through peer-reviewed journals and presentations at national and international conferences. Deidentified data and analysis scripts will be made available to researchers within the RU via a secure server, in line with ethical guidelines and participant consent. In compliance with European and German data protection regulations, patient data will not be publicly available through open science frameworks but may be shared with external researchers on reasonable request and under appropriate data protection agreements.
Trial registration number DRKS00030915.
- Machine Learning
- MENTAL HEALTH
- Methods
- PSYCHIATRY
- PUBLIC HEALTH
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
STRENGTHS AND LIMITATIONS OF THIS STUDY
Prospective design specifically tailored for predicting psychotherapy outcomes.
Large, naturalistic sample from multiple outpatient clinics enhances ecological validity.
Multimodal data collection (clinical, ambulatory, neuroimaging and psychophysiological) and state-of-the-art algorithms provide comprehensive insights.
Extensive assessments may increase participant burden, potentially affecting adherence.
Exclusion of hard-to-reach patient groups that do not present at outpatient clinics may limit the generalisability of the results.
Introduction
Internalising disorders—such as anxiety disorders (ADs), unipolar depression, obsessive–compulsive disorder (OCD) and post-traumatic stress disorder (PTSD)—are the most widespread and economically burdensome mental health conditions, often characterising the typical outpatient care recipient.1–3 Cognitive–behavioural therapy (CBT) stands as an effective fist-line treatment for these disorders,4 yet less than 50% of the patients show clinically meaningful improvement.5–9 This gap highlights the urgent need for early identification of individuals likely to show treatment non-response (TNR) to CBT and subsequent stratification towards more personalised treatments, in other words: precision psychotherapy.10–16 A thorough understanding of TNR is essential for designing personalised treatments. However, the limited insights into the moderators and mechanisms behind TNR call for an in-depth examination. Traditional group-based research sheds light on potential mechanisms but falls short of offering single patient predictions, which are crucial for an a priori patient stratification.
Predictive analytics, which is based on artificial intelligence (AI) and machine learning (ML) algorithms, offers a promising approach to bridge the translational gap from group-based evidence (‘what is effective?’) to individual case predictions (‘… for whom?’). Big data and AI are increasingly integrated into mental health research, but initial optimistic results overestimated their predictive accuracy.17 18 The reliance on small sample sizes from clinical trials, prioritising internal over external validity, provides a biased foundation for developing algorithms meant for broader healthcare application. In addition, efficacy studies do not necessarily translate into predictive information for treatment outcome. Moreover, the emerging field of predictive analytics is rapidly evolving, and earlier methodologies have been criticised for producing overly optimistic or inconsistent predictions, highlighting the need for caution and rigour in this area.19 20 Rather than using randomised controlled trial (RCT) data for secondary predictive analytics, we advocate for study designs that are designed from the outset for predictive purposes, including a representative, ecologically valid sample, an a priori defined predictive use case (eg, outcome prediction) and a hypotheses-based data collection focusing on the predictive question of interest. The research unit (RU) 5187 is a collaborative effort to better understand the signature of TNR based on a multimodal deep phenotyping approach and develops predictive analytics within a naturalistic cohort of patients receiving outpatient care. In line with contemporary developments towards a neurobiologically based, dimensional classification of mental disorders, such as the Research Domain Criteria by Insel et al, and the Hierarchical Taxonomy of Psychopathology by Kotov et al, our approach is both transdiagnostic, multimodal and multileveled.21 22 We aim to improve current limitations in the field of precision psychotherapy by pursuing the following overarching research questions:
What is the biobehavioural signature of TNR towards CBT?
Our approach integrates a multimodal deep phenotyping method to identify the signature of TNR to CBT across various levels of analysis. We focus on various aspects and components of emotion regulation (ER), a crucial process targeted by CBT techniques, as depicted in figure 1. Given ER’s relevance across a wide range of mental disorders within the internalising spectrum, understanding its role in TNR could lead to more customised therapeutic interventions.23 24 We hypothesise that initial configurations of ER are linked with TNR, anticipating that these signatures will—in line with our transdiagnostic approach—exhibit considerable overlap in different units of analysis across internalising disorders. To characterise ER, we employ a combination of methods including ecological momentary assessment (EMA), passive sensing, psychological assessments, psychophysiological measures (EEG and cardiac measures—ECG) and neuroimaging (brain structure, intrinsic connectivity, task-based brain activation and connectivity). We intend to investigate a spectrum of ER strategies, from implicit to explicit and automatic to controlled.23 These behavioural probes are embedded within tailored experimental paradigms, targeting the putative signature of TNR.25 26
The neurodynamics of emotion regulation in internalising disorders, targeted by CBT techniques. CBT, cognitive–behavioural therapy; dACC, dorsal anterior cingulate cortex; dlPFC, dorsolateral prefrontal cortex; InfPAR, inferior parietal lobe; OFC, orbitofrontal cortext; vACC, ventral ACC; vmPFC, ventro-medial PFC.
Can we predict TNR on the single patient level?
Building a translational bridge from group-based information to single-case predictions, we hypothesise that we can predict TNR on the single-case level with a prediction accuracy that is above chance so that expert decisions on the treatment of choice can be supported. To achieve this, we design our study specifically for the purpose of predictive analytics in the outpatient sector by recruiting a naturalistic cohort with minimal inclusion and exclusion criteria reflecting typical patient populations. We furthermore expect that prediction accuracies can be significantly improved by combining multiple data levels. State-of-the-art ML models will be employed to learn from both unimodal and multimodal data across various granularities (eg, imaging-derived phenotypes vs raw MRI data). Data will be integrated at different levels using techniques such as multimodal fusion. Employing appropriate cross-validation protocols will prevent data leakage and provide reliable estimates of prediction accuracies, thereby addressing the limitations of previous work.20 27 We will furthermore search for best combination of predictors. Finally, we hypothesise that we can identify a cost-efficient set of predictors (EMA, EEG, ECG) that may serve as proxy measures for cost-intensive (eg, neuroimaging) assessments. Health-economic considerations will guide our investigation to find cost-efficient combinations of markers or proxies thereof.
Methods and analysis
Composition of the RU
The RU 5187 PREACT (‘Towards precision psychotherapy for non-respondent patients: from signatures to predictions to clinical utility’) is a collaborative initiative grounded in the rich academic landscape of Berlin (Germany) and is carried out at Humboldt-Universität zu Berlin (HU Berlin), Freie Universität Berlin (FU Berlin), Charité—Universitätsmedizin Berlin (Charité) and Psychologische Hochschule Berlin (PHB). The RU brings together a diverse team of 13 principal investigators (PIs) alongside 36 additional members, including postdoctoral researchers, doctoral candidates and student assistants. Recruitment is based on four academic outpatient clinics (in German: ‘Hochschulambulanz’—HSA). Those HSAs are based in three of the participating sites (HU Berlin: HSA-HU and ZPHU—‘Zentrum für Psychotherapie der Humboldt-Universität zu Berlin/Center for psychotherapy of the Humboldt-Universität zu Berlin’; FU Berlin: HSA-FU, PHB: HSA-PHB). The RU 5187 is funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG, project number: 442075332) with a grant spanning 4 years, from 1 July 2022 to 30 June 2026. The RU is funded from 1 July 2022 to 30 June 2026. Recruitment started in May 2023 and is expected to continue until May 2025. The RU’s structure is intricate, consisting of nine specialised subprojects (SPs), all orchestrated under a comprehensive coordination project (table 1 and figure 2).
Scientific subprojects
Structure and content of the RU and major collaborations among the subproject (SP). RU, research unit.
The coordination project’s mandate extends beyond administrative oversight, embedding within its core objectives the promotion of young researchers, the advancement of gender equality, the management of network funds and the enhancement of scientific communication. This multidimensional approach not only fosters a nurturing environment for research but also cultivates a framework for inclusive and interdisciplinary collaboration (table 2).
Talent promotion and public outreach
The RU encompasses three core science and service projects (SP1, SP2 and SP3). Additionally, six specific science projects target narrower research questions (as detailed in table 1). These include EEG/ECG assessment (SP4), psychological measures (SP5), ambulatory assessment (AA, SP6) and neuroimaging (SP7, SP8, SP9). These projects are tightly integrated to increase the added value and answer research questions reaching beyond particular SPs. The rationale, hypotheses and analytical plans for all projects are preregistered with the open science framework, ensuring transparency and rigour. For additional information about the projects, refer to the preregistrations listed in table 1.
Summary of SP
SP1 sets up the naturalistic, prospective-longitudinal observational clinical study dedicated to patient treatment, recruitment, clinical characterisation and documentation of treatments and investigates the extent to which TNR can be predicted on basic clinical information. SP1 will be supported by our methods (SP2) and neuroimaging platform (SP3). As such, SP1 contains both a research and a service component. Research-wise, we will test the hypothesis if treatment (non-)response can be predicted with sufficient accuracy (75%) based on clinical routine data, when state-of-the-art ML methods are applied. Service-wise, SP1 will coordinate the clinical trial by serving as a recruitment hub for the other SPs, implementing diagnostic assessments, documenting and quality-controlling treatment contents, and calculating primary and secondary outcomes.
SP2 also has a service and a science component. Within the service component, we will (1) securely store and organise retrospective and prospective data in a way that all participating PIs have access and (2) provide hardware for processing complex biomedical data with state-of-the-art ML algorithms including deep learning. Within the science component, we will develop a methods toolbox relying on predictive analytics and ML algorithms for identifying (non-)responders on a single-subject level. This will be done separately for the different disorders as well as across disorders to investigate if treatment response operates on an individual or cross-diagnostic scale. We will systematically compare the predictive value of different data domains, such as clinical data, AA, electrophysiological data and neuroimaging data and will evaluate different methods for data fusion. Specifically, we will experiment with sophisticated ML techniques such as convolutional neural networks for analysing neuroimaging data and will employ transfer learning for dealing with comparatively low sample sizes.
SP3 has three work packages (WP). In the first service WP, we will pilot and optimise our neuroimaging battery. It consists of three ER tasks, spanning the spectrum from automatic to voluntary ER, resting state fMRI as well as structural MRI. The second service WP is to guarantee the proper conduction of the main study by training of data acquisition, to perform continuous quality checks of acquired data, and to deliver preprocessed and basic first-level data for task-based and resting state fMRI. WP1 and WP2 are service components with no hypotheses to be tested. WP3 is the scientific component where we will use resting state fMRI for all and sometimes task-based fMRI. Within the third WP, we will perform data-driven connectomics analyses using resting state fMRI for prediction of TNR using our newly developed network-based statistics (NBS)-predict tool. NBS-predict combines the data-driven construction of sparse networks that can be biologically interpreted and ML algorithms for prediction. In contrast to SP7, we do not focus on specific regions or hypotheses but use a data-driven method which will identify any network that proves to be useful for prediction. In contrast to SP9, we will not focus on specific disorders or the differences in prediction but on transdiagnostic prediction averaging over 4 and 3 disorders or 8 disorders, respectively.
SP4 aims to investigate whether previously established psychophysiological indicators of ER, such as the late positive potential (LPP), relative left frontal activity or ECG indices like HRV, can predict which patients are at risk of TNR.28 29 We hypothesise that low emotional reactivity as indicated by relatively small amplitudes of the LPP to pleasant or threatening pictures, respectively, as well as low capacity of ER as indicated by lower modulation of the LPP under ER instructions (ie, lower reduction of LPP under reappraisal or lower increase of LPP under savouring conditions) is predictive of TNR. Further, low relative left frontal activity under resting conditions and low increase under ER conditions, as well as low HRV generally and low increase under ER conditions can be used as indicators of aberrant emotional reactivity and deficient ER and are hypothesised to be predictive of TNR. Moreover, we hypothesise that compared with predictions based on single psychophysiological predictors, the combination of all available predictors, including their interactions, will yield models with higher overall prediction accuracy in identifying TNR on a single-subject level.
SP5 seeks to examining the relevance of specific psychological mechanisms and processes (so-called transdiagnostic factors) for TNR in internalising disorders. We will focus on ER strategies as a putative key mechanism of change underlying CBT and assess psychological ER processes that match and enrich—in terms of the targeted constructs—the prediction tools of the other SPs. Moreover, we assess further transdiagnostic skills and factors such as illness perceptions and treatment expectancies as well as therapeutic alliance. The relevance of these transdiagnostic factors will be examined above and beyond routinely available clinical diagnostic and sociodemographic data. Further, an initial mediation model accounting for the interdependence of transdiagnostic factors will be examined. Moreover, this SP will focus on early changes in transdiagnostic factors in CBT, thereby acknowledging that early changes can predict treatment outcomes in internalising disorders. Lastly, we adopt a cost-effectiveness perspective and deliver estimates on the relation of costs of the RU assessments (ie, resources spent on the input for the prediction) to their contribution to increase prediction accuracy. Further, we model implications of a better prediction of TNR in ‘standard CBT’ in terms of cost-effectiveness when patients with a poor prognosis would be assigned to alternative or modified treatment options (which might address the psychological processes and mechanisms that have been identified to be associated with risk of TNR).
SP6 uses digital phenotyping to derive proxies that transfer the predictive value of neural TNR-signatures and related ER-(dys)functions of internalising disorders into clinical settings and can be assessed repeatedly with minimal effort for therapists and patients during different treatment periods (eg, before, during and after therapy). We use data actively and passively collected by personal electronic devices such as smartphones and wearables, to derive phenotypes including ER markers of individuals at risk of TNR based on indicators of ER as the overarching core construct. Research questions are: Can data actively and passively generated by personal electronic devices such as smartphones and wearables predict TNR for the individual patient at T20 (ie, after 20 sessions) and post/ 12 months? Which smartphone-collected data-modality predicts TNR for the individual patients at T20 and post/ 12 months best, and how does predictive power depend on the time lag? Do EMA and sensor-based ER markers link to ER’s neural and biobehavioural markers assessed in the other SPs?
SP7 focuses on the anterior cingulate cortex (ACC) a crucial hub region for ER given its integration in several core brain networks, which has been demonstrated to be central to both psychopathology and CBT. We propose that a dedicated systems-driven perspective will enable us to develop neuroimaging-derived biomarkers capable of predicting TNR to CBT for internalising disorders. In particular, SP6 will optimise ACC predictor information by employing three sophisticated methods: (1) an efficient gradient of brain connectivity encoding alterations in the neural basis of explicit-controlled and implicit-automatic ER, (2) a graph-theoretical characterisation of ACC networks and (3) a Bayesian hierarchical modelling approach to networks focusing on the ACC hub. Additionally, like in all other SP results from this project will be returned into the RU, by providing an optimal composition of ACC-centred system characteristics to SP2, by comparing our predictive performance with other neuroimaging-based predictors from SP8 and SP9 and by investigating whether our predictive model is associated with markers based on ER strategies and psychophysiological markers of ER from SP4 and SP5.
SP8 aims to develop and validate neurofunctional markers of ER as predictors of TNR. We will develop effective connectivity models of fronto-limbic modulation during three different ER strategies in order to establish a set of models with the best predictive values for TNR in retrospective data sets of patients with depressive, obsessive–compulsive and social AD (SAD). We will then validate our models in the transdiagnostic prospective sample of the proposed RU cohort, also testing for generalisation and specificity across tasks and diagnoses. We will refine our models by including behavioural markers of ER from clinical measures (SP5) and digital phenotyping (SP6), exchange results with (SP4) to develop psychometric measurement models of ER using BOLD, EEG and ECG data, and compare task-based effective connectivity measures with resting state connectivity (SP7). Finally, we will share our data with SP2 for single case prediction.
SP9 aims to investigate the hypothesis to which extent neural structure and function related to treatment response can be generalised from specific phobia (SPH) as a ‘model’ disorder for pathological fear to other, more complex ADs and related conditions (SAD, panic disorder (PD), agoraphobia (AG), generalised AD (GAD) and PTSD). Second, empirical evidence for the separate classification of OCD will be tested by comparing signatures and predictions between OCD and the anxiety spectrum. SP9 will use two retrospective datasets in order to train predictive models which in turn will be replicated/cross-validated in the prospective cohort from the RU. All datasets share a common neuroimaging backbone comprising neuroanatomic data, resting-state networks (EEG, fMRI) and task-based activity and connectivity which will be used to investigate generalisation gradients.
Naturalistic prospective-longitudinal observational clinical trial: study design, participants, eligibility criteria and current status
We include patients with a primary diagnosis of an internalising disorder receiving CBT at one of the four participating academic outpatient clinics. These university-based clinics are integral to Germany’s psychotherapy landscape, serving dual roles in research and education within the German healthcare system, hence ensuring the high ecological validity and representativeness of our sample. The inclusion criteria of the study are as follows: (1) Age of 18 years or older, (2) a primary diagnosis based on the Diagnostic and Statistical Manual of Mental Disorders(DSM)-5 criteria for either SPH, SAD, PD, AG, GAD, OCD, PTSD or unipolar depressive disorder (major depression or dysthymia, DD), (3) an indication for outpatient CBT treatment, (4) a minimum treatment plan of 12 sessions (excluding mandatory probatory sessions required in Germany before treatment), (5) a symptom severity indicated by a General Severity Index of the Brief Symptom Inventory (BSI-GSI30) greater than 0.56 (except for SPH31), (6) a Clinical Global Impression–Severity Scale (CGI-S32) score of 3 or higher (indicating at least mild illness) and (7) a willingness to participate in the study. We abandoned the BSI-GSI inclusion criteria for SPH because too many patients had to be excluded at the beginning of recruitment due to this criterion, making it impossible to reach our recruitment goal. Exclusion criteria were held to a minimum in order to foster ecological validity of the sample. They mainly encompass contraindications for outpatient treatment, including (1) a current secondary diagnosis of moderate to severe substance use disorder (including regular use of benzodiazepines), (2) current psychosis, (3) current bipolar disorder, (4) more than moderate suicidality and (5) medical conditions that contraindicate CBT. Patients who require inpatient treatment during the course of the outpatient treatment will be excluded. However, the presence of other comorbid disorders will not lead to exclusion. Concurrent pharmacological treatment is permitted and will be thoroughly documented.
Patients opting for the experimental assessments (EMA, fMRI and EEG/ECG) will receive detailed information about possible risks and additional eligibility criteria including specific safety-related exclusion criteria will be applied.
In our sample (28 November 2024), the distribution of diagnoses was as follows: ADs 35.5%, OCD 18.6%, DD 41.0% and PTSD 4.9%. Compared with a nationwide sample from the KODAP consortium with 6098 patients (ADs 29.29%, OCD 5.76%, DD 58.53% and PTSD 6.42%), the distributions are largely similar.33 34 However, the proportion of patients with OCD is notably higher in our sample. This can be attributed to the specialisation of one of our outpatient clinics in OCD treatment, which led to an intentional enrichment of the sample.
For current study progress, see figure 3.
Study CONSORT flow chart, 28 November 2024. 1Termination during probatory sessions. 2Mandatory preliminary sessions (up to four) required in the German healthcare system before initiating treatment. CONSORT, Consolidated Standards of Reporting Trials.
Sample size calculation
We estimate effect size from two perspectives. First, we aim to achieve or surpass the best ML model prediction accuracy reported in existing literature, which is around 75%. We expect that improving prediction accuracy by 25% from the chance level (50%) will have clinical and health-economic significance.
To accurately estimate the necessary sample size, two key factors are considered: (1) prediction accuracy is expected to increase with a larger training dataset, up to the maximal achievable accuracy based on the underlying effect and (2) fluctuations in detected prediction accuracy around the true value will decrease as the test set size increases. The exact training set size needed to achieve a 75% prediction accuracy is uncertain and depends on both predictor characteristics and the chosen algorithm. Our simulations, consistent with previous research,35 suggest that prediction accuracies plateau at around 300 samples in the training set across various models (figure 4). Further simulations indicate that a test set of about 100 patients will likely yield a measured prediction accuracy of 75%–85%.
Simulation of association between sample size and prediction accuracy. Association of train and test set sizes with prediction accuracy across simulations. (A) Across increasing size of n train, prediction accuracy is increasing until reaching a plateau (ie, the maximum prediction accuracy possible based on the underlying effect). Starting point and gain in prediction accuracy depend on data and model parameters. (B) Across increasing size of n test, the measured prediction accuracy varies less and less around the true prediction accuracy for the underlying effect.
Based on these insights, a fivefold cross-validation method appears suitable. This involves assigning 400 patients to the training set and 100 to the test set, with the analysis repeated five times to estimate variability.
Additionally, for comparative purposes with traditional studies, we also present an effect size calculation for group-based linear multiple regression analyses, despite not using this method in our study. Considering an alpha level of 5%, a power (β) of 80% and 19 predictors, small effects (R2=0.07) can be detected with a sample size of 500 patients. This calculation is based on the power analysis conducted using G*Power V.3.1.9.2 for linear multiple regression. The recruitment plan for the study is designed to ensure a steady and sufficient number of participants throughout the trial. Initially, at baseline, 585 patients will be recruited. By the fifth session (S5), it is anticipated that the patient number will decrease to 555 due to an expected attrition rate of 5%. As the study progresses to the 20th session (S20), the number of participants is projected to be 527, accounting for a cumulative attrition rate of 10%. Finally, at the postassessment, the study expects to have 500 patients, reflecting a cumulative attrition rate of 15%. This recruitment strategy considers potential dropouts and aims to maintain a robust sample size for the study’s duration.
Treatment setting, contents and procedure
The treatment follows the same basic CBT approach at all sites and is based on established manuals,36–39 but not strictly manualised to mirror the ecological approach. We aim to predict TNR in a naturalistic setting, where CBT is usually tailored (based on clinician expertise, ie, ‘eminence-based personalisation’) to the specific requirements of each patient. However, despite this form of personalisation, TNR remains a frequent outcome. Therefore, this naturalistic CBT-based treatment addresses individual variation in clinical phenomenology, including comorbid disorders. These elements are not only addressed through individualised therapy plans but are also analysed as potential predictors of TNR.
Standard manuals36–39 typically encompass psychoeducational components, various intervention forms like exposure techniques, behavioural activation or cognitive restructuring and relapse prevention strategies. Research40 41 indicates that despite their differences, these techniques collectively enhance basic ER skills. The regimen and duration of treatment adhere to the specific regulations of the German public healthcare system, markedly distinct from those in other countries such as the UK or USA. The legal framework permits up to 80 sessions of 50 min each. Insurance companies approve varying numbers of sessions, categorised into ‘short-term therapy part 1’ for sessions S1–12, short-term therapy part 2 for sessions S13–24, and extensions for long-term treatment usually up to 60 or in some cases up to 80 sessions. Typically, treatments involve one session per week, with longer sessions for exposure therapy.
The number and length of sessions at our outpatient clinics vary between the type and severity of the disorder, averaging 40–50 sessions over 1–1.5 years. Nonetheless, treatment can deviate to as few as 10–20 sessions (short-term treatment) over 4 months or extend to 80 sessions (long-term treatment) over 2 years. Considering these variations and trial time constraints, we decided to evaluate treatment outcomes after a predetermined number of sessions (‘treatment dose’ at S20) and at the end of treatment with a maximum duration of 12 months (‘treatment time’; although the naturalistic treatment may go beyond, but not within the study regimen). Additionally, we will monitor early psychological changes after the fifth session (S5).
Patients seeking treatment will first undergo routine screening, receive a detailed explanation of the study protocol and provide informed consent before participating. The initial five preliminary/probatory sessions, designated as the diagnostic interval, serve as the baseline period for eligibility assessment based on inclusion and exclusion criteria, and for collecting multimodal predictors such as clinical data, AA including EMA and passive sensing, fMRI and EEG/ECG data. All clinical data are electronically compiled in a central database via ‘Research Electronic Data Capture’ (REDCap), which will be integrated into a shared and secured server alongside our multimodal datasets.42 43
For an in-depth overview of the study flow, see figure 5.
Study workflow. AA, ambulatory assessments; EEG, electroencephalogram; HRV, heart rate variability; S, treatment session.
The study was approved by the Institutional Ethics Committee of the Department of Psychology at Humboldt Universität zu Berlin (approval no. 2021-01) and the Ethics Committee of Charité-Universitätsmedizin Berlin (approval no. EA1/186/22). Participants in the experimental sessions, including AA, EEG/ECG and neuroimaging studies, will receive financial compensation. The trial has been registered with the German Clinical Trials Register (DRKS00030915) and on the open science framework platform of the Center for Open Science, available at https://osf.io/bcgax.
Measures and outcomes
Clinical assessments
The clinical interview, demographics and the questionnaires outlined in table 3 will be conducted by trained clinical psychologists who hold at least a bachelor’s degree. These professionals will operate under the supervision of board-certified psychotherapists.
Study measures
Primary outcome
In defining treatment response, we adhere to the concept of clinically significant change as introduced by Jacobson & Truax (1991).44 In line with the guidelines for the German academic outpatient sector,31 the shift from a dysfunctional to a functional mental condition is operationalised by achieving a BSI-GSI score of less than 0.56 at T20/post-treatment assessment. Additionally, the change observed from baseline to T20/post-treatment must be reliable, as defined by the Reliable Change Index. Considering the reasons mentioned above, the primary outcome does not apply to SPH. Here, we will apply the secondary outcome as primary (50% reduction of symptom load in the Hamilton Anxiety Rating Scale (HAM-A)).
Secondary outcomes
Secondary outcomes will be assessed using disorder-specific severity measures derived from clinical interviews. In line with the approach of other multicentre clinical studies,45 a 50% reduction in symptoms on the HAM-A46 will signify treatment response for PD, AG, SAD, SPH and GAD. For DD, a 50% symptom reduction on the Montgomery-Åsberg Depression Rating Scale47 is required. For OCD, a 35% symptom reduction is needed, as measured by the Yale-Brown Obsessive Compulsive Scale.48 49 For PTSD, a 30% symptom reduction is needed, as measured by the Clinician-Administered PTSD Scale for DSM-5.50 51
Additionally, baseline ER capacities and their change processes, which are key mechanisms of interest, will be thoroughly examined pretreatment and after session 5. In the event of a patient withdrawing from the treatment, the reasons for dropout will be documented, and reasons for dropout as well as outcome measure proxies will be gathered post-dropout where feasible.
For a detailed overview of further clinical characteristics and questionnaires concerning ER, skills and resources, expectations/illness perception, and therapeutic alliance, refer to table 3.
Ambulatory assessment
Participants registering for this SP are provided with a wearable (Withings France SA Scanwatch; https://www.withings.com/de/de/scanwatch) and the respective app for passive data collection, as well as a study app for EMA and location tracking. Participants can decide if they want to take part in (1) three active EMA phases at baseline, S20 and post and continuous passive data sharing for up to 365 days or (2) one active EMA phase at baseline and passive data sharing for 15 days. Each EMA phase lasts 14 days with 8 questionnaires per day. The full set of assessed features is provided in box 1.
Ambulatory assessment
Ecological momentary assessment (EMA)
Affect (8×per day; positive and negative affect scale).
Emotion regulation (8×per day; adapted from RESS-EMA).
Situation/context (8×per day; self-constructed).
Social contact (8×per day; self-constructed).
Significant events (8×per day; self-constructed).
Physical fitness (1×per day; self-constructed).
Therapeutic agency (1×per day; adapted from TAI).
Reminder manual ECG (2×per day; self-constructed).
Passive sensing
Physical activity.
Sleep.
Heart rate.
Location (pseudonymised; event-based trigger).
RESS, Regulation of Emotion Systems Survey; TAI, Therapeutic Agency Inventory.
Biobehavioural assessments
Psychophysiology
The EEG/ECG measurement employs three consecutive experimental paradigms: A resting state paradigm, a picture viewing task (including passive viewing and ER subtasks), and a feedback processing paradigm (including a guessing ‘doors task’ phase, followed by a probabilistic reversal learning phase). Detailed specifications of the measurement protocol are provided in table 4. More information on the experimental tasks is given in the preregistrations (see table 1).
EEG/ECG session
Neuroimaging
MRI assessments in this study are conducted using two 3-Tesla Siemens Prisma MRI scanners, located at two sites (Center for Cognitive Neuroimaging Berlin, at FU Berlin and Berlin Center for Advanced Neuroimaging, BCAN, at Charité–Universitätsmedizin Berlin). The MRI protocol comprises an anatomical scan, a resting-state functional MRI, and three distinct ER tasks span a spectrum from automatic to voluntary ER mechanisms. Owing to the bicentric data collection and associated requirements for harmonising quality control measures, we strictly adhere to standard operating procedures that are clearly defined. We continuously cooperate and provide feedback between sites, conduct regular site visits and perform quality checks on the acquired data as soon as they are available. Standardised preprocessing protocols from the ENIGMA fMRI group will be applied.52 Detailed specifications of the scanning session are provided in table 5. Details on the experimental tasks and scanning protocols can be found in the preregistration (see table 1).
Neuroimaging session
Data analysis: ML preprocessing and validation pathway
Here, we focus on ML models for the predictive analytics. For detailed analyses of single SP, see preregistrations in table 1.
For predicting TNR, we will initially develop separate models for each data modality—including clinical, behavioural, psychophysiological and neuroimaging data. Subsequently, we will integrate these modalities to enhance the predictive accuracy.
For tabular data, which includes demographic, clinical and psychological data as well as features derived from EMA data, we will primarily use classical ML algorithms like gradient boosting and random forests, known for their efficacy with structured data.53 We will also explore the use of elastic nets, support vector machines and experimental deep learning techniques,54 although cautiously due to their unproven benefits for non-hierarchical data and small sample sizes. For the interpretation and assessing the importance of the individual features, we will use model-agnostic shapley values (SHAP,55). This will also allow for comparing the relative importance of the different data sources (eg, demographic vs EMA data).
For neuroimaging data, we will systematically compare classical ML methods relying on extracted features (ie, image-derived phenotypes) with deep learning methods directly applied to minimally processed (ie, harmonised and linearly registered to MNI space) MRI data. Starting point will be a comparably simple CNN model that we originally trained for differentiating between patients with Alzheimer’s disease and healthy controls based on T1-weighted (MPRAGE) data,56 but used as a baseline model for other use cases.57 Based on this baseline model, we will experiment with different settings of the network based on the training data to optimally adapt to the acquired data while still allowing to generalise (eg, numbers of layers or filters, different loss functions, different regularisation schemes, including short-cuts or dense connections, etc).
To manage small sample sizes in deep learning models based on MRI data, we will implement two key strategies. First, data augmentation will be employed, where modified versions of the input data are incorporated into the dataset. This approach not only serves as regularisation but has also proven to enhance performance in MRI studies.58 Second, we will use transfer learning.59 Here, models are initially trained on a source dataset (eg, UK Biobank or other retrospective datasets) for a specific task (like disease classification) and subsequently fine-tuned on a target dataset for a different task (such as outcome prediction). This method leverages pretrained weights to refine the model for the target task, a technique demonstrated to significantly boost performance in our previous studies, including across different tasks and MRI sequences.57
Additionally, we will explore various scenarios for effective transfer learning, such as the adaptation of models from general disease classification to specific outcome prediction within patient groups. This process not only helps in identifying key areas for differentiation between patient groups but also in assessing the transferability and efficacy of models across different tasks. Our approach will include a comparative analysis of models with and without pretraining to determine the most effective settings for transfer learning. Successful model transfer to clinical practice will be a key consideration. To elucidate the decision-making process of these neural networks, we will use advanced visualisation techniques such as SHAP and layer-wise relevance propagation.60 These post hoc methods will help in understanding the neurobiological correlates of TNR as identified by neuroimaging techniques, and in assessing the impact of different transfer learning models.
After analysing each data modality separately to evaluate its stand-alone predictive performance, we will focus on integrating diverse data types, including neuroimaging data, psychophysiological, clinical data, psychological test scores and EMA data. Combining these varied data sources presents unique challenges, especially given the unknown interactions between them and the uncertainty regarding the informativeness of specific features.61 We will explore both intermediate and late data fusion techniques. In intermediate fusion, features from different modalities are combined into a single model after separate processing and extraction. In contrast, late fusion uses these features as inputs for different classifiers in an ensemble approach. For MRI data, we will use either image-derived phenotypes or CNN-based representations. Our aim is to determine the relevance of different data domains, including emerging putative biomarker candidates from various sources. We will also pursue a cost-sensitive strategy, focusing on minimal yet effective feature sets, applying dimensionality reduction and feature selection techniques like recursive feature elimination. Data fusion methods will range from simple concatenation to more complex techniques like ensemble learning, boosting, attention-based methods and multiple kernel learning, the latter of which allows for consideration of interdependencies between data sources.61 62 For deep learning-based multimodal data integration, we will rely on the recently published toolbox fusilli (https://iopscience.iop.org/article/10.1088/2516-1091/acc2fe),
To maintain comparability between methods and data types (tabular or neuroimaging, unimodal vs multimodal analysis), we will apply a consistent validation strategy. Data will be randomly divided into training, validation and test sets. To account for the relatively small sample size, we will perform these splits multiple times (based on computational feasibility and prediction variance) and calculate average performance metrics as well as variability indices to estimate the reliability of prediction. In cases of data imbalance to the respective label (age, sex or TNR), stratified splitting will be used. The effectiveness of the models will be assessed based on their ability to predict treatment outcomes for individual patients and across various disorders. This evaluation will also consider the interpretability of the models by clinical or domain experts. Further, we will examine models for age-specific or sex-specific effects by separately analysing performance across different subgroups, such as women vs men or younger versus older individuals, and our developed deep representation visualiser allowing to estimate the effects of different confounding variables in a trained ML model.63 The significance of each model’s performance will be verified using permutation tests, where labels are randomly shuffled. This comprehensive approach ensures rigorous evaluation and comparability of the models across different data types and research scenarios.
We are committed to using Python for model development, leveraging libraries like sklearn64 and PyTorch, and making our code publicly available for broader use and collaboration. ML experiments will be conducted at the computing facility of Charité–Universitätsmedizin Berlin.
Data handling and dissemination
In compliance with the stringent requirements of the General Data Protection Regulation in Germany (‘Datenschutz-Grundverordnung’—DSGVO), we have set up a data protection concept for prospective clinical studies and signed a Joint Controller Agreement by all participating institutions for data handling. These documents have been approved by all respective data protection officers. Given the sensitivity of these health data, they are subject to strict protection protocols and are not shared with third parties as long as they remain identifiable. To ensure the privacy of the patients is preserved, all data are safeguarded by appropriate technical (audit-trails and user rights management in access group) and organisational measures (standard operating procedures and standardised personal training) throughout the study phase. One year after the completion of the trial, all data will be irreversibly anonymised. MRI will be defaced to remove identifiable features. Only after this anonymisation process will the data be shared with other consortia such as ENIGMA (Enhancing neuro imaging genetics through meta-analysis; https://enigma.ini.usc.edu/).
Patient and public involvement
Implementing a participatory research approach right from the outset together with the DZPG (Deutsches Zentrum für psychische Gesundheit–German Research Center for Mental Health) PPI (patient participation involvement), we developed the research objectives and study design of the renewal proposal conjointly with a patient advisory board encompassing five experts with lived experience. Those were invited to join our consortium and consulted us in several meetings. Importantly, principal investigators wrote layman’s sketches of each study plan which were uploaded in a joint document that was reviewed and commented by those five experts.
Discussion and outlook
TNR to CBT poses a substantial challenge for affected individuals, mental healthcare resources and societies. The research consortium RU 5187 aims to enhance the understanding of this phenomenon by revealing its biobehavioural signature, ranging from deep to digital phenotypes. Transcending the mechanistic understanding, we aim to provide predictive models of TNR that can be applied to single patients. Both goals are pursued within a clinical sample that exerts a high ecological validity for the outpatient sector, thus offering potential to close the translational gap to clinical utility. As such, we primarily focus on internalising disorders, which represent the majority of patients in this treatment setting. By employing innovative ML techniques, we intend to develop theranostic markers tailored to the individual needs of patients. If successful, these predictors will provide scientifically informed markers of TNR, potentially aiding future decision-making on personalised treatments in clinical practice.
Methods-wise, our approach also emphasises improving predictive analytics by creating a comprehensive dataset that was a priori set up for this purpose and which has a high degree of ecological validity, hence overcoming current shortcomings in the field.17 This dataset will enable testing of various models and algorithms to determine which are most suitable for our data. Health-economic considerations are integral to our study, guiding us towards identifying cost-efficient combinations of markers or their proxies. The deeply phenotyped and ecologically valid sample will complement large-scale initiatives for external validation, contribute to the development of a data model warehouse for public exchange and align with national and international consortia such as ENIGMA and others.
However, in order to alleviate high rates of TNR, early identification of vulnerable patients by the use of predictive analytics is not enough. Rather, we need to know what works for whom and how we can augment current (psychotherapeutic) first-line treatments. Let us imagine that in the future we could implement an algorithm that is able to predict the risk of TNR to a first-line treatment with high precision. How can we optimise existing treatments for the respective patient in a next step? While clinicians often personalise psychotherapy based on their clinical judgement and experience, there is a growing need to enhance this process through data-driven insights to optimise treatment outcomes for patients at risk of TNR. Clinicians’ predictions are poor and outperformed by ML algorithms.65–67 Our approach aims to move beyond traditional clinician-based personalisation by incorporating predictive analytics to identify biobehavioural markers associated with TNR. This allows for the tailored adaptation of treatment strategies based on individual patient profiles.
While pharmacological treatment augmentation (eg, adding medication from a different class) has already made its way into current treatment guidelines, non-pharmacological augmentation strategies are an emerging field of research, presenting a rich body of strategies—from neuroscientifically informed approaches such as neurofeedback or non-invasive brain stimulation68–74 to behavioural approaches such as exercise, mind-body practices and high-ventilation breathwork75 76 to more psychologically based add-on interventions.77–80
The long-term aim of RU 5187 for the next funding phase is to implement non-pharmacological augmentation of psychotherapy and to enrich clinical study designs with predictive modelling as laid out by individualised treatment rule models such as the Personalised Advantage Index (PAI78 79).
For example, neurofeedback interventions can modulate neural circuits involved in ER, thereby enhancing the effectiveness of CBT for patients with deficits in this area.71 Similarly, non-invasive brain stimulation techniques, such as transcranial magnetic stimulation, transcranial direct current stimulation and transcutaneous auricular vagus nerve stimulation, have been shown to facilitate neuroplasticity and improve cognitive functions essential for therapeutic progress.72 Behavioural interventions like aerobic exercise can increase neurogenesis and release neurotrophic factors, which may augment the therapeutic effects of psychotherapy.76
By identifying specific deficits through predictive markers—such as impaired ER or cognitive control—we can select appropriate augmentation strategies to target these processes. This precision augmentation aims to enhance the patient’s responsiveness to psychotherapy, potentially leading to better outcomes.
Future research, integrating clinical augmentation studies with predictive analytics, should investigate the optimal timing, type and combination of augmentation strategies. Understanding when to introduce augmentation (eg, before, during or after psychotherapy sessions) and which interventions are most effective for specific patient profiles will be crucial for integrating these approaches into clinical practice. Moving towards data-driven personalisation and incorporating enriched treatment protocols holds promise for improving treatment outcomes for patients at risk of TNR. By enhancing our understanding of the mechanisms underlying TNR and treatment augmentation on the other side, we can develop targeted interventions that complement psychotherapy and address the specific needs of each patient.
The sophisticated design and methods developed in this study for predicting TNR also offer promising avenues for clinical application. One viable implementation strategy is the integration of these models into digital therapy monitoring and feedback systems. The Trier Treatment Navigator serves as an excellent example of best practice in this domain.81 The Trier Treatment Navigator uses data-driven individual case predictions to estimate the risk of a patient being ‘not on track’ for therapy success, providing clinicians with actionable feedback. This includes tailored recommendations to adapt therapeutic strategies, focusing on overarching change mechanisms. The prospective validation of the Trier Treatment Navigator in a RCT demonstrated its utility, with patients monitored by the Trier Treatment Navigator achieving significantly greater symptom reduction (effect size d=0.3) when therapists adhered to the feedback within the first 10 sessions.81 This underscores the potential of feedback systems to enhance treatment outcomes, contingent on clinician engagement and adherence to recommended adjustments.
Another compelling methodological approach for personalised treatment selection is the PAI, an interesting methodological approach for selecting the right treatment from the multitude of possible treatment options.82 It is based on an algorithm that is trained on a data set of patients who have been assigned to one of at least two (or more) treatment alternatives. In a first step, two predictive models are developed that predict response to treatment A or B. In the second step, the treatment outcome for the treatment received (so-called factual treatment) and the potential outcome of the treatment alternative (so-called counterfactual treatment) are then predicted in the test data set for each individual patient. This is possible when in both treatment groups the same potential predictors have been measured. If both models come to a very similar result, then treatment options A and B appear to work comparably well for this patient. However, the greater the difference between the two models, the more this patient would benefit from treatment stratification. Initial studies support the usefulness of the PAI for the treatment stratification of patients with a depressive disorder83 or PTSD.84 Interestingly, data-driven stratification using the PAI appears to be superior to clinical decision-making.67 Nevertheless, a recent study with an external validation on two independent samples shows that the generalisability of PAI-based prediction is still limited85 and thus points to a general problem of generalisability in predictive analytics.17 In addition, early PAI analysis pipelines appeared to be affected by data leakage and limited cross-validation procedures.86 Nevertheless, the PAI offers an interesting approach to solving the problem of ‘What works for whom?’ in order to arrive at evidence-based stratification recommendations for individual patients. Thus, the PAI might be an interesting approach to select an optimal addon treatment for those who have been predicted to be at risk for TNR
In conclusion, TNR to psychotherapy—despite its well-documented general effectiveness—is a substantial problem in mental healthcare.6 This situation can be improved by (1) better understanding the phenomenon of TNR in order to inspire novel treatment approaches or augmentation strategies, (2) early identification of patients at high risk for TNR and (3) evidence-based recommendations to (non-pharmacological) augmentation strategies. Predictive analytics offers suitable methodological approaches to arrive at a data-based prediction or stratification recommendation. Big data and AI have undoubtedly arrived in psychotherapy research. However, limitations such as insufficient generalisability need to be met before predictive analytics models can be put into practice. In addition to aspects of data protection for sensitive health data, the technical implementation, manageability and acceptance of corresponding digital assistance systems must be considered.10 Despite all the possibilities that digital innovations in the field of predictive analytics can offer to support prognosis, treatment selection and process monitoring, the clinical decision will always lie with the practitioner. However, with a balanced consideration of the risks and opportunities of AI-supported predictive analytics, it has the potential to optimise the effectiveness of psychotherapy for the benefit of our patients. The RU aims to make a significant contribution to further developing these areas.
Ethics and dissemination
The study has received ethical approval from the Institutional Ethics Committee of the Department of Psychology at Humboldt Universität zu Berlin (approval no. 2021-01) and the Ethics Committee of Charité-Universitätsmedizin Berlin (approval no. EA1/186/22).
Results will be disseminated through peer-reviewed journals and presentations at national and international conferences. Deidentified data and analysis scripts will be made available to researchers within the RU via a secure server, in line with ethical guidelines and participant consent. In compliance with European and German data protection regulations, patient data will not be publicly available through open science frameworks but may be shared with external researchers upon reasonable request and under appropriate data protection agreements.
All participants provide written informed consent after being fully informed about the study’s aims, procedures, potential risks and benefits and are aware of their right to withdraw at any time without consequence.
To ensure confidentiality and data protection, participant data are pseudonymised, with personal information securely stored on encrypted servers accessible only to authorised personnel. We have protocols in place to monitor data integrity, participant safety and to address any adverse events throughout the study.
One year after the last patient out, data will be fully anonymised to allow for open science practices and data sharing, while at the same time securing patients’ data protection rights. Summaries of findings will also be shared with participants and the public through institutional websites and, where appropriate, public events.
Ethics statements
Patient consent for publication
References
Footnotes
Collaborators Humboldt-Universität zu Berlin (HU Berlin), Freie Universität Berlin (FU Berlin), Charité–Universitätsmedizin Berlin (Charité) and Psychologische Hochschule Berlin (PHB).
Contributors All authors made significant contributions to the conception and design of the study protocol. The protocol was written by TL, CU and UL and critically reviewed by FB, SE, LF, J-DH, SH, KH, FJ, NK, CK, BR, KR, NS and HW. Guarantor is TL. AI technology used: We used OpenAI’s ChatGPT, an advanced AI language model, to assist in the revision of our manuscript. Reason for use: The primary purpose of employing ChatGPT was to enhance the linguistic quality of the text and to identify and correct grammatical, typographical, and stylistic errors, thereby improving overall clarity and readability. How AI was used: Specific sections of the manuscript were input into ChatGPT, which provided revised versions aimed at refining sentence structure, coherence and flow. The AI was solely used for language editing and did not influence the study’s design, analysis, interpretation of results or conclusions. Summary of input, output and review: Input: Portions of the manuscript text were submitted to ChatGPT for revision. Output: Revised text with improved grammar, style and clarity. Review process: All AI-generated revisions and suggestions were meticulously reviewed and approved by the authors to ensure accuracy and maintain the integrity of the content. The authors retained full control over all substantive aspects of the manuscript, ensuring that the original research and findings remained unaffected by the AI-assisted edits. The authors confirm that the final manuscript reflects their own work and that all AI-assisted modifications have been thoroughly vetted for correctness and appropriateness. Any substantive changes to the content, analysis or conclusions were made solely by the authors without AI intervention.
Funding This study is funded by German Research Foundation (DFG project number: 442075332).
Competing interests KH is a scientific advisor and received virtual stock options of Mental Tech, which develops an AI-based chatbot providing mental health support. All other authors have no completing interest to declare.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.