Article Text

Download PDFPDF

Original research
Accuracy of machine learning methods in predicting prognosis of patients with psychotic spectrum disorders: a systematic review
  1. Jing Ling Tay1,
  2. Yun Ling Ang2,
  3. Wilson W S Tam3,
  4. Kang Sim1,4,5
  1. 1 West Region, Institute of Mental Health, Singapore
  2. 2 Department of Nursing, Institute of Mental Health, Singapore
  3. 3 Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
  4. 4 Yong Loo Lin School of Medicine, National University of Singapore, Singapore
  5. 5 Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
  1. Correspondence to Dr Jing Ling Tay; tay.jing.ling{at}aic.sg

Abstract

Objectives We aimed to examine the predictive accuracy of functioning, relapse or remission among patients with psychotic disorders, using machine learning methods. We also identified specific features that were associated with these clinical outcomes.

Design The methodology of this review was guided by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy.

Data sources CINAHL, EMBASE, PubMed, PsycINFO, Scopus and ScienceDirect were searched for relevant articles from database inception until 21 November 2024.

Eligibility criteria Studies were included if they involved the use of machine learning methods to predict functioning, relapse and/or remission among individuals with psychotic spectrum disorders.

Data extraction and synthesis Two independent reviewers screened the records from the database search. Risk of bias was evaluated using the Quality Assessment of Diagnostic Accuracy Studies tool from Cochrane. Synthesised findings were presented in tables.

Results 23 studies were included in the review, which were mostly conducted in the west (91%). Predictive summary area under the curve values for functioning, relapse and remission were 0.63–0.92 (poor to outstanding), 0.45–0.95 (poor to outstanding), 0.70–0.79 (acceptable), respectively. Logistic regression and random forest were the best performing algorithms. Factors influencing outcomes included demographic (age, ethnicity), illness (duration of untreated illness, types of symptoms), functioning (baseline functioning, interpersonal relationships and activity engagement), treatment variables (use of higher doses of antipsychotics, electroconvulsive therapy), data from passive sensor (call log, distance travelled, time spent in certain locations) and online activities (time of use, use of certain words, changes in search frequencies and length of queries).

Conclusion Machine learning methods show promise in the prediction of prognosis (specifically functioning, relapse and remission) of mental disorders based on relevant collected variables. Future machine learning studies may want to focus on the inclusion of a broader swathe of variables including ecological momentary assessments, with a greater amount of good quality big data covering longer longitudinal illness courses and coupled with external validation of study findings.

PROSPERO registration number The review was registered on PROSPERO, ID: CRD42023441108.

  • MENTAL HEALTH
  • PSYCHIATRY
  • Schizophrenia & psychotic disorders

Data availability statement

Data are available upon reasonable request. Data is uploaded as supplementary information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • This review is limited by the heterogeneity of included studies.

  • Few of the included studies had large samples, used the same machine learning models or conducted external validation, limiting analysis.

  • This is the first systematic review that examined the prediction of relapse and remission of psychotic spectrum disorders.

Introduction

Psychotic spectrum disorders such as schizophrenia are debilitating disorders that affect about 1% of the population.1 People with psychotic disorders experience disturbances in their perception, thoughts, behaviours, speech and emotions. Worldwide, schizophrenia is a leading cause of disability2 3 and, in addition, people with schizophrenia have a shorter lifespan of about 10 years, due to suicide and homicide.4 5 They may also experience impairments in psychological and social functioning, leading to poor self-esteem and overall poorer well-being.6 Although psychotic disorders can be treated, the varying courses of psychotic disorders (episodic, remitting or chronic) render the prediction of illness prognosis challenging.7 In addition, treatment responses can also be diverse and heterogeneous.8–10 As such, while some patients have complete remission after an initial episodic deterioration,11 12 others may not and still have residual symptoms. More than half of the people with an episode of psychotic disorder will go on to experience disruptive symptoms, persistent morbidity, functional deficits and relapses requiring hospitalisation, thus exacting a major toll on both patients and their carers.12 13

Machine learning methods (which include conventional statistical methods such as logistic regression) can process enormous amounts of data and seek to provide greater accuracy in the diagnosis and prediction of the treatment response and the course of medical conditions such as myocardial infarction,14 heart failure,15 cancers16 17 and organ transplantation outcomes.18 19 The use of machine learning methods with their ability to deal with large quantities of data, improvement with training and predictive analytics20 can be potentially beneficial in the field of psychiatry21 to complement existing clinical assessments including comprehensive history taking, mental status examination, physical examination, acquisition of corroborative information from reliable informants and relevant investigations. Accurate evaluation of the outcomes of psychotic disorders can potentially allow for the optimisation of holistic treatment, which can further reduce the intensity and duration of morbidity and disability, promote rehabilitation and enhance overall recovery.22 The different trajectories of psychotic disorders were associated with different modifiable and non-modifiable socio-demographic, clinical and treatment factors22 23 and can start early in the illness course, and are apparent even in childhood.24 Machine learning methods can analyse the interplay of these factors and potentially identify and predict outcomes within psychotic spectrum conditions.

To date, there is no systematic review which evaluates the predictive accuracy of machine learning methods associated with specific outcomes of psychotic disorders such as functioning, relapse and remission. Thus, we conducted a systematic review to examine the predictive accuracy of machine learning methods for functioning, relapse, remission and associated factors within patients with psychotic spectrum disorders.

Methodology

Protocol registration and review reporting

The methodology of this systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA),25 and the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy,26 namely, definition of review questions, study selection, data collection, evaluation of risk of bias and data synthesis. The review was registered on PROSPERO, ID: CRD42023441108. This study did not require institutional review board approval as there was no recruitment of human participants.

Definition of review questions

Our review questions included:

  1. What is the predictive accuracy for functioning, relapse and/or remission among patients with psychotic disorders using machine learning methods?

  2. What are the relevant clinical variables identified in the prediction of functioning, relapse and/or remission among patients with psychotic disorders?

Search strategy and study selection

For the search strategy, PubMed was searched using the key words ‘(artificial intelligence)’ AND ‘schizophrenia’ AND ‘hospitali*’ OR ‘relapse’ OR ‘remission’. The initial search included all relevant keywords. They were (artificial intelligence) OR (machine learning) OR (logistic regression) OR (natural language processing) OR (neural network) OR (data science) or (deep learning) or (digital biomarker*) or (digital phenotyping) AND (schizophren*) OR (psycho*) OR (delusional disorder*) AND (hospitali*) OR relapse OR remission OR recovery OR function* (see online supplemental appendix I). Six electronic databases: CINAHL, EMBASE, PubMed, PsycINFO, Scopus and ScienceDirect were searched for relevant articles from database inception until 21 November 2024. Reference lists of included articles and review papers were also reviewed.

Supplemental material

Two authors independently reviewed the articles from the database search and the reference lists of relevant articles. Studies were included if (1) they involved the empirical use of machine learning methods to predict functioning, relapse and/or remission among individuals with psychotic spectrum disorders, and (2) were published in English. Exclusion criteria included (1) studies which did not include psychotic spectrum conditions, (2) studies that did not use machine learning methods and (3) studies that only examined the effects of specific pharmacological or non-pharmacological interventions with no relationship to clinical outcomes such as functioning, relapse or remission.

Data collection

Data collection was performed by evaluating and summarising the findings from each included paper. Details that were documented included country, clinical setting, participant characteristics, predictive variables, outcome measurements and relevant findings (see online supplemental table I). Outcome measurements included were functioning, relapse and remission. Functioning referred to vocational and social functioning.27 Relapse was defined as hospitalisation, or worsening of symptoms that may require pharmacological or non-pharmacological treatment without the need for hospitalisation.28 Remission was defined as significant improvement of psychotic symptomatology for at least 6 months.29 Predictive accuracy was established by measures in relevant studies including area under curve (AUC), accuracy, balanced accuracy, MSE (mean square error) and RMSE (root mean square error), wherever appropriate and available. AUC classifies true negatives and positives30 with the following scorings: ≥0.9 (outstanding), 0.80 to <0.90 (excellent), 0.7 to <0.8 (acceptable) and 0.50 to <0.70 (poor).31 Accuracy is a predictive measure for machine learning classification models, and the formula comprises the number of correct predictions divided by the total prediction number. Balanced accuracy is another machine learning predictive measure that is better fitted for imbalanced data (formula is (sensitivity+specificity)/2). The scores for accuracies are: >90% (very good), 70–90% (good), 60–70% (acceptable) and <60% (poor).32 33

Supplemental material

Regression machine learning models are measured by MSE and RMSE. MSE is the distance between the regression line and data points. RMSE is the SD of prediction errors, which documents the concentration of the data around the line of best fit.34 The smaller the MSE or RMSE value, the better the model.

Data synthesis

Synthesised data were presented in two tables. The first table detailed the characteristics and findings of each study. The second table showed the quantitative values of the predictive accuracies of the machine learning methods, wherever available.

Risk of bias

Evaluation of risk of bias was conducted using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool from Cochrane.35 The QUADAS-2 tool was developed based on the four-stage process36 of scope definition, literature review, consensus discussion and tool refinement via piloting.35 The tool consisted of seven questions and evaluated the studies on their recruitment process, usage of index test and reference standard, with indicators (1) low, (2) high or (3) unclear. Studies that scored low on all domains were appraised to have an overall low risk of bias, whereas studies that scored high or unclear in at least one domain are appraised to be at risk of bias. Two authors independently assessed the methodological quality of each study and presented the findings using RevMan. Discrepancies that arose were resolved with mutual discussions until consensus was obtained.

Patient and public involvement

There was no patient and public involvement in the conduct and production of this paper.

Results

General features of studies

A total of 3083 articles were identified and screened. 61 full text articles were evaluated for eligibility, and 23 articles were finally included in the review37–58 (see figure 1 for PRISMA flowchart). The table indicating the reasons for articles' exclusion after full text review is indicated in online supplemental appendix II. The quality assessment scores ranged from 3 to 7, with most studies scoring about 5 (see online supplemental appendix III).

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flowchart to show the process of identification and screening to determine the included studies.

Study characteristics

The characteristics of the included studies are summarised in online supplemental table I. The studies included in this review were mostly conducted in the west (n=21, 91%) except for two studies. One study (n=1, 4%) was conducted in Korea,49 and the other in Taiwan.52 The studies were mainly conducted in America (n=11).37 39–43 46–48 54 58 Other studies were conducted in France (n=2),45 53 UK (n=1),51 Greece (n=1)57 and Spain (n=1).38 Four studies were conducted in more than one country, whereby participants were recruited from (1) America and India,59 (2) England, Scotland and Denmark,50 (3) Belgium and the Netherlands60 and (4) Greece and Denmark.55 One study used MRI information from an online data repository.56 Participants included patients with psychotic spectrum disorders comprising first episode psychosis, schizophrenia spectrum disorders and psychotic mood disorders. Sample sizes ranged from 10 to 1298. The numbers of entries from electronic health records, Facebook or search history from Google ranged from 5076 to 52 815.

Logistic regression was examined by five studies,43 49–51 59 support vector machine by five studies,41–43 55 60 random forest by three studies,40 42 45 lasso algorithm by two studies56 60 and anomaly detection by four studies.39 46 54 59 Two studies each examined gradient boosting,42 43 naïve Bayes47 54 and clustering models.38 58 One study each examined pruned forward selection generalised regression,53 long short-term memory neural network,48 bagging ensemble,52 transformer,57 fully connected neural network,57 convolution neural network57 and gated recurrent unit.57 One study evaluated a clustering-based local outlier factor model and neural network model.37

Functioning,40 50–52 60 relapse37–49 53 57–59 and remission50 51 54 55 were examined by 5, 18 and 4 studies, respectively. Predictive variables that were examined included demographic data, clinical and cognitive variables, neuropsychological variables, MRI data, social history, substance use, treatment, side effects and passive sensing data from wearables or phone applications that included information about distance travelled, call and text log, screen activity and sleep patterns.

Accuracy of artificial intelligence methods in predicting functioning, relapse and remission among patients experiencing psychotic symptoms

Functioning

Five studies examined functioning as an outcome measure40 50–52 60 (see online supplemental table I and II). All studies examined socio-demographic and clinical data, while one study also examined cognitive functioning.52

Predictive accuracy

The chronological AUC values of the three machine learning models for internal validation were 0.79–0.92 (random forest),40 0.73–0.74 (logistic regression)50 and 0.63–0.67 (support vector machines).60 Random forest thus performed the best in internal validation.

For external validation, logistic regression performed the best (AUC=0.88), followed by random forest (0.68–0.88) and support vector machine (0.57–0.67). Of note, a study conducted in Taiwan found that clinical symptoms achieved better RMSE scores as compared with cognitive function.52 The study examined multiple algorithms and found that the bagging ensemble that evaluated clinical symptoms performed the best, followed by neural network, support vector machine and linear regression.

Predictors of good functioning

Positive predictive factors of good functioning included white ethnicity,50 good baseline functioning,40 50 51 better cognitive functioning,52 having an education,50 51 having a job,50 living with spouse and children,51 staying in personal or parents’ house,50 good social support,50 physical activity.50

Predictors of poor functioning

Predictors of poor functioning included having no religion,50 rental housing,50 dependence on state benefits financially,50 social withdrawal especially during the late adolescence period,50 use of alcohol51 and drugs,50 having forensic history,50 grandiosity,60 delusion,50 suspiciousness,50 hostility,50 tension,60 hallucinations,50 60 stereotyped thinking,60 difficulty in abstract thinking,60 lack of spontaneity,50 flat affect,60 motor retardation,60 unusual thought content,60 poor judgement and insight.60

Relapse

18 studies examined relapse as an outcome. Eight studies examined relapse using passive sensing data from a phone application37 39 46–48 58 59 or wearable device,57 five studies examined data including demographic and clinical variables,38 40 43 45 60 while one study each examined (1) data posted on Facebook by participants,41 (2) participants’ Google search history,42 (3) clinical notes,49 (4) MRI data56 and serum kynurenine metabolites and cytokines.53

Predictive accuracy

The best-performing model was logistic regression (AUC=0.95) that evaluated data from electronic health records.49 In fact, the different variations of electronic health records achieved acceptable to outstanding AUC of 0.78–0.95.49 The use of cytokines and peripheral kynurenine metabolites in predicting relapse using pruned forward selection generalised regression achieved an AUC value of 0.71.53 Evaluation of Google search contents and behaviours yielded an AUC of 0.66 (support vector machine) to 0.74 (random forest).42

Socio-demographic and clinical data achieved acceptable to good AUC values of 0.60–0.63 (support vector machine),60 63.8% (random forest)45 and 0.63–0.73 (gradient boosting, support vector machine, logistic regression and multilayer perceptrons).43 Lastly, evaluation of facial cues, physiological and audio data with various algorithms achieved poor to acceptable AUC values of 0.45–0.62.57

Predictors of relapse

One study compared the classification prediction between machine learning algorithms and humans, including experienced psychiatrists, and found that machine learning algorithms performed better than humans.43

Passive sensing data from wearable devices and phone applications detected unique and personalised changes in heart rate,57 sleep,37 distance travelled,47 57 time spent in certain locations,37 39 frequency of calls made39 47 and length of call durations.37 Passive sensing data were found to provide better predictive value compared with self-reported clinical data,46 although another study highlighted the importance of self-reported clinical data.47 Recent data were able to provide predictions with greater accuracy compared with data obtained further away.39 40

Predictors of relapse included older age,60 lower education level,45 impaired social functioning,59 pregnancy/pre/post pregnancy,43 prior psychiatric admission,43 more hospitalisations,45 comorbidities,43 metabolic syndrome,45 lower quality of life,59 alcohol use,38 substance use,43 59 sleep changes,59 greater anger and aggression,45 emotional withdrawal,60 suicidal symptoms,60 negative symptoms,60 depressive symptoms,43 45 59 60 anxiety symptoms,43 59 poor insight and judgement,60 electroconvulsive therapy43 and greater use of antipsychotics.38

Two articles in the USA examining online activities in predicting relapse found that predictors included greater use of Facebook during typical sleeping hours and reduced frequency of search and length of queries in Google search.42 There was also an increased use of second and third person pronouns,41 indefinite pronouns,42 vulgarities,41 emotive and angry words41 42 and words surrounding death and negative affect.41 There was decreased use of words surrounding body, health,42 friends, work belonging and achievement.41 One study found that patients tended to have deranged levels of certain serum kynurenine metabolites and cytokines, specifically, greater levels of interleukin-6, tumour necrosis factor-alpha, picolinic acid and lower levels of interferon-gamma and quinolinic acid.53

Predictors of non-relapse

Predictors of non-relapse included better functioning,38 good family relationships38 and lesser negative symptoms.38

Remission

Four studies examined remission or significant symptomatic improvement.50 51 54 55 Three studies evaluated socio-demographic and clinical data,50 51 55 while one study evaluated MRI data of frontal-parietal brain functions.54

Predictive accuracy

For internal validation, the support vector machine (AUC=0.79)55 outperformed logistic regression (AUC=0.70).50 However, that was possible due to the shorter follow-up period in the earlier study, as compared with the longer follow-up period in the later study. Likewise, in external validation, the study with the shorter follow-up period achieved greater predictive accuracy (AUC=0.68, accuracy=63.53%)55 with support vector machine, as compared with studies with longer follow-up period (AUC=0.63–0.65, accuracy=61–63%), that were evaluated with logistic regression.50 51 A study conducted in the USA found that deep learning achieved the best result (accuracy=70%) over many other algorithms (accuracy=60.8–67.4%).54

Predictors of remission

Predictors of remission included white ethnicity,51 staying in own or parent’s house,50 living with spouse and children,51 good baseline functioning,50 55 good social support50 51 having education,50 51 personal income,50 55 depressive symptoms,50 shorter duration of untreated illness,55 voluntary admission,50 having insight,50 no prior self-harm50 and PANSS (Positive and Negative Syndrome Scale) item score for excitement.51 In terms of MRI data, activation of the dorsolateral prefrontal cortex was most predictive of significant symptomatic improvement in participants.54

Predictors of non-remission

Predictors of non-remission included staying in private or rented accommodation,51 social withdrawal,50 51 spending time doing leisure activities,50 longer duration of untreated psychosis,50 55 self-harm,50 substance use,50 psychiatric nurse contact in the recent 3 months,50 involuntary treatment,50 high PANSS item score for hallucinations,50 delusions,55 somatic concerns,51 abnormal thought content,50 51 challenges in abstract thinking,51 emotional blunting and withdrawal,55 social withdrawal,50 lack of spontaneity,55 poor rapport,55 lack of judgement and insight.55

Discussion

Predictive accuracies of functioning, relapse and remission

The best performing algorithms for (1) functioning, (2) relapse and (3) remission were (a) random forest (internal validation), logistic regression (external validation), (b) logistic regression and (c) support vector machine, respectively. Likewise, logistic regression was also found to be the best machine learning model in predicting postpartum depression in another study.61 Several studies found better performance and sensitivity with the random forest algorithm,62–66 while gradient boosting was found to produce greater specificity values.62

An earlier study by Smucny et al found that deep learning achieved the best accuracy results over a variety of other algorithms.54 Conversely, Zlatintsi et al used several deep learning algorithms but achieved poor to acceptable AUC values of 0.45–0.62.57 This is probably related to the different variables examined in both studies. The study by Zlatintsi et al examined facial cues, physiological and audio data,57 whereas the earlier study evaluated information surrounding frontoparietal brain region activations.54

The ranges of AUC for (1) functioning, (2) relapse and (3) remission were (a) 0.63–0.92, (b) 0.45–0.95 and (c) 0.70–0.79.

The values were comparable with other studies using machine learning methods, although to diagnose depression (AUC=0.64–0.96),67 postpartum depression (AUC=0.78–0.93),68 schizophrenia (AUC=0.71–0.90)69 as well as to predict prognosis of non-psychiatric conditions such as obstructive coronary artery disease after single-photon emission CT myocardial perfusion imaging (AUC=0.71–0.83).70 Of note, it is less accurate compared with recent machine learning studies that predicted mortality after medical ward admission (AUC=0.92)71 or within COVID-19 patients (AUC=0.95).72

The wide range of predictive accuracies across different studies could be due to the heterogeneity of the studies, including the nature and number of included variables, duration of follow-up, sample size, machine learning algorithm used, imputation methods and the measurements of outcomes.73 For instance, certain predictive variables (electronic health records data)49 yielded greater predictive accuracies over other variables (facial cues, audio data).57 Studies that used 10-fold cross-validation43 50 56 60 tended to achieve lower predictive accuracies compared with studies that used 5-fold cross-validation,40 42 as the 10-fold cross-validation could lead to bias by the exclusion of divergent data scenarios. A longer duration of follow-up could uncover more relapses or otherwise, and clinical relapses in turn affect the level of psychosocial functioning. Additionally, some studies were vulnerable to imbalanced classification as their sample sizes of the outcome groups were vastly different, leading to skewed class distribution, biased model training, impaired generalisation and inaccurate evaluation metrics.74

The finding that the predictive accuracy was higher for relapse versus remission could be related to the granularity of clinical and non-clinical variables examined with relevance to the illness,75 sensitivity to illness changes over time, presence of other intervening factors such as treatment adherence,76 life events77 and machine learning algorithm model adopted.

Predictors of good functioning or remission

In terms of demographic factors, white ethnicity predicted good functioning, which was consistent with earlier studies.78 79 Some possible explanations included poorer help-seeking by minority races related to distrust of mental health services,80 lack of mental health literacy,81 82 language and cultural barriers.83 Congruent with current literature, good baseline functioning such as receiving some years of education, having a job and personal income were relevant predictive factors,75 84–86 and it was thought that patients with higher functioning were better able to cope with their psychotic symptoms.86 Another predictor of better functioning was the presence of accommodation, such as staying in owned or parents’ house, and together with spouse and children. This was in agreement with the literature that the presence of shelter and social support, social and physical activities were vital for good functioning84 87 88 and social support can enhance resilience to adverse situations (including poor health) and improve quality of life.89–91

In terms of clinical factors, positive predictors included shorter duration of untreated illness,92 93 voluntary admission, affective symptoms,94 lack of self-harm behaviours and having insight into condition.84 95 A shorter duration of untreated psychosis provided patients with the preserved cognitive ability and psychological capacity to make meaning from their psychiatric experiences, which can initiate earlier recovery, better functioning and quality of life.92 Good insight was associated with treatment compliance, symptoms reduction, better functioning but also presence of shame, depression, hopelessness and suicidal behaviours, which required relevant interventions.96 Insight is multidimensional, with clinical, psychological and neurological domains and subjected to changes related to personal or contextual events.96 Thus, insight needs to be addressed sensitively and in a culturally appropriate manner within the clinical management context.

Predictors of poor functioning or non-remission

Having no religion was a demographic factor that predicted poor functioning or non-remission, which was consistent with an earlier review that found that religious faith was associated with positive benefits on mental health,97 via a sense of gratitude, forgiveness and better social support within the faith community.98

Social factors which were associated with poor functioning or non-remission included social withdrawal,99 staying within rental unit,100 being financially dependent on government benefits,100 101 less engagement in leisure activities102 and usage of drugs and alcohol.103 104

Clinical factors included longer duration of untreated psychosis,92 93 involuntary admission105 and poor rapport.106 Longer duration of untreated psychosis was linked with cognitive impairments and consequent psychosocial difficulties and delayed recovery.92 Involuntary admissions, which could be a consequence of poor treatment compliance,107 108 could cause negative feelings and experiences among patients and negatively impact therapeutic rapport, treatment compliance and prognosis.105 In turn, poor rapport could also be affected by perceived stigma, lack of insight, psychiatric symptoms including delusions and social withdrawal, which then resulted in poorer outcomes.106

As supported by earlier literature, psychiatric symptoms that were associated with poor functioning included positive symptoms (grandiosity, delusion, suspiciousness, hallucinations),109 negative symptoms (social withdrawal, emotional withdrawal, lack of spontaneity, bluntedness, flat affect, motor retardation),110 thought disturbance (stereotyped thinking, difficulty in abstract thinking, unusual thought content),111 self-harm,112 aggression,113 poor judgement and insight.114

Predictors of relapse

Demographic factors that predicted relapse in this review included older age115 116 and being less educated.117 Lower education was possibly related to cognitive impairments118 or negative symptoms119 which confer greater predisposition for relapse.120 121

Social factors that predicted relapse included poor social functioning,111 alcohol104 and substance use.104 122 Poor social functioning coupled with the use of recreational substances could trigger psychotic symptoms and experiences and result in illness relapses.123 Use of alcohol had been associated with relapse and concomitant suicidal and aggressive behaviours, and disruption of interpersonal functioning.124

Implicated clinical factors included psychiatric comorbidities,125 metabolic syndrome,126 psychiatric hospitalisation127 and increased hospitalisation.128–130 Of note, longer hospitalisation may prevent future relapses,131 132 but a greater number of hospitalisations was associated with cognitive decline, likely related to the duration of illness, which can affect treatment adherence and worsen overall functioning.130 133

Psychiatric symptoms that predicted relapse in this review were poor sleep,134 135 depressive symptoms,136 anxiety symptoms,137 suicidal behaviours,112 aggression/anger,138 negative symptoms (emotional withdrawal),100 139 poor judgement and insight.140 Sleep impairments not only worsen cognitive symptoms and negative symptoms; they can exacerbate cognitive processes related to hallucinations via disruption of inhibitory control mechanisms.134 135 Depression and anxiety were associated with worsening of positive psychopathology such as delusion, hallucinations137 141 and extended social isolation,142 admissions, self-harm behaviours, neurocognitive deficits and lower life quality.137 143 144 Of note, the causes and course of depressive symptoms among patients with psychotic spectrum disorders are heterogeneous,136 hence affective symptoms are associated with both better prognosis145 and also relapse of psychotic spectrum disorders.146

Aggressive behaviours are associated with relapse, which could be related to underlying substance use, psychotic symptoms, treatment non-adherence and poor social support.138 147 Negative symptoms were associated with a greater risk of relapse because they were less amenable to treatment, correlated with poorer functioning and lower productivity and insight.110

Regarding treatment factors associated with relapse, they included use of electroconvulsive therapy and greater antipsychotic doses. Although electroconvulsive therapy was also effective in preventing relapses,148 149 it signified greater severity of illness150 and thus discontinuation or reduction in frequency of electroconvulsive therapy may trigger relapse.149

Internet activities that predicted schizophrenia relapse included impaired sleeping pattern, greater use of negative, emotional, angry swear words,42 second, third and indefinite pronouns and fewer use of positive words.41 42 There is a paucity of studies that examined internet activities or passive sensing data in predicting relapse. Passive sensing data in this review found that relapse was predicted by personalised changes in sleep, distance travelled and call history. Limited studies surrounding passive sensing data found that self-report data collected from phone applications showed good acceptability among patients.151

Overall, it is worthwhile to note that common predictors across functioning, relapse and remission included clinical factors pertaining to the nature of psychopathology (such as negative symptoms, affective features, level of insight),60 risk features (self-harm tendencies, aggression)45 60 and psychiatric comorbidities (such as alcohol/substance abuse).38 51 These factors should be considered in any machine learning predictive model for clinical outcomes in psychotic spectrum conditions. Apart from the common predictors, different factors were associated with functioning, remission (such as duration of untreated psychosis, voluntary admission)50 55 and relapse (such as poor sleep, increased antipsychotic dose)45 59 and these variables need to be examined in detail to allow for delineation of different outcomes within a predictive model.

Limitations and future research directions

There were several limitations for this study. First to consider was study heterogeneity. The studies were heterogenous in terms of the predictive variables being evaluated, outcome measurements and the duration of follow-up. Second, few studies examined predictive accuracies of external validation. Third, there is a paucity of studies which examined the models using different machine learning models within the same study which would proffer further insights into the utility and feasibility of these models applied singly and/or in combination. Fourth, many studies had small sample sizes, which could lead to extensive instability in the models and the resultant predictions.152

There are several suggestions for future directions. First, options such as passive sensing data using phone applications seem promising to provide cost-effective first-hand relapse information. Future studies can include a broader swathe of variables including ecological momentary assessments to evaluate predictive accuracy for relapse and remission. Second, a greater variety of machine learning algorithms to predict prognosis, including deep learning models can be incorporated. Third, the use of larger sample sizes can enhance the stability and predictions of the machine learning models. Models with smaller sample sizes need to incorporate additional model-building steps to prevent model instability.152 Fourth, wider adoption of external validation of study findings can enhance the robustness of the analytical model used within big data which covered a longer duration of illness course and episodes.

Conclusion

In conclusion, this review of existing machine learning studies found that the predictive accuracy using clinical and non-clinical variables was better for relapse than remission in psychotic spectrum disorders and relevant variables included socio-demographic, illness, neuro-behavioural factors and social activities including online and phone activities. Logistic regression was the best performing algorithm. Future machine learning studies may want to focus on the inclusion of a broader swathe of variables including ecological momentary assessments, with a greater amount of good quality big data covering longer longitudinal illness courses and coupled with external validation of study findings.

Data availability statement

Data are available upon reasonable request. Data is uploaded as supplementary information.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

References

View Abstract

Footnotes

  • Contributors Study conception and methodology: JLT, KS. Data collection: JLT, YLA. Analysis and interpretation of results: JLT, YLA, WWST, KS. Writing—original draft: JLT. Writing—review and editing: JLT, WWST, KS. Fund acquisition: KS. Supervision: KS. All authors reviewed the results and approved the final version of the manuscript. All authors reviewed the results and approved the final version of the manuscript. Guarantor is JLT.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.