Original research

ADNEX risk prediction model for diagnosis of ovarian cancer: systematic review and meta-analysis of external validation studies

Abstract

Objectives To conduct a systematic review of studies externally validating the ADNEX (Assessment of Different Neoplasias in the adnexa) model for diagnosis of ovarian cancer and to present a meta-analysis of its performance.

Design Systematic review and meta-analysis of external validation studies

Data sources Medline, Embase, Web of Science, Scopus, and Europe PMC, from 15 October 2014 to 15 May 2023.

Eligibility criteria for selecting studies All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass. Two independent reviewers extracted the data. Disagreements were resolved by discussion. Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool). Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed.

Results 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, justification of sample size, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration. The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies). The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125).

Conclusions The results of the meta-analysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum biomarker, CA125, was used as a predictor. A key limitation was that calibration was rarely assessed.

Systematic review registration PROSPERO CRD42022373182.

What is already known on this topic

  • The optimal management of ovarian tumours depends on tumour type, and therefore diagnosis before surgery is important to determine an appropriate treatment plan

  • ADNEX (Assessment of Different Neoplasias in the adnexa) is a diagnostic prediction model that estimates the probability that a tumour is benign, borderline, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic

  • ADNEX has two versions: one with and one without the serum biomarker, cancer antigen 125 (CA125)

What this study adds

  • Both versions of ADNEX differentiated between benign and malignant tumours, with an area under the receiver operating characteristic curve of ≥0.90 for different populations, countries, and settings

  • ADNEX with CA125 was superior to ADNEX without CA125 for distinguishing between malignant subtypes, with moderate heterogeneity for the models' performance

  • The models had a 95% (with CA125) and 91% (without CA125) chance of being of use clinically in new centres when used at the 10% risk of malignancy threshold

  • Many studies included in this review and meta-analysis had poor reporting and unjustified exclusion of patients with missing data, and calibration performance was not assessed

How this study might affect research, practice, or policy

  • ADNEX can be used to support clinical decisions

  • ADNEX without CA125 is sufficient to help decide whether conservative follow-up surgery in a local centre or referral to an oncology centre is appropriate

  • To help decide on optimal management of a tumour suspected to be malignant, ADNEX with CA125 is superior to ADNEX without CA125, because it differentiates better between malignant subtypes

  • The methodological quality of external validation studies needs to improve in terms of justification of sample size, handling of missing data, and assessing calibration of the model

Introduction

The optimal management of patients with an ovarian mass depends on the histology of the mass. Patients with a benign mass can be managed without surgery, with clinical and ultrasound follow-up, or with conservative surgical techniques.1 2 Malignant tumours benefit from management in specialised oncology centres, but borderline malignancies, stage I primary invasive tumours, and advanced primary invasive tumours might require different surgical approaches.3 4 To optimise patient triage without operating on all masses, diagnostic models can be used to estimate the likelihood of malignancy and hence to plan treatment for patients.

Given the potential advantages of accurately predicting the risk of malignancy, the International Ovarian Tumour Analysis (IOTA) group developed the Assessment of Different Neoplasias in the adnexa (ADNEX) risk prediction model, based on three clinical and six ultrasound predictor variables.5 The clinical variables are age, serum levels of the biomarker, cancer antigen 125 (CA125), and type of centre (oncology centre v other). An oncology centre is defined as a tertiary referral centre with a specific gynaecology oncology unit. The ultrasound variables are the maximum diameter of the lesion, proportion of solid tissue (defined as the largest diameter of the largest solid component divided by the largest diameter of the lesion), number of papillary projections, presence of >10 cyst locules, presence of acoustic shadows, and ascites. The ADNEX multinomial logistic regression model estimates the risk of five tumour types: benign, borderline, stage I primary invasive, stage II-IV primary invasive, and secondary metastatic.

The total risk of malignancy calculated by ADNEX is the sum of the risks for each malignant subtype. ADNEX has two versions: one with and one without CA125 as a predictor (the ADNEX formulas are provided in online supplemental material S1).5 When we refer to the ADNEX model or ADNEX, we refer to both versions of the model. The model was developed on data from 5909 patients with an adnexal mass who subsequently underwent surgery, recruited at 24 centres in 10 countries (Belgium, Italy, Czech Republic, Poland, Sweden, China, France, Spain, UK, and Canada). Although developed on data from patients that underwent surgery, the performance of ADNEX has also been evaluated in cohorts that included patients managed without surgery.6–9

ADNEX is included in national guidelines (eg, in Belgium, the Netherlands, and Sweden),10–12 and recommended by scientific societies, such as the International Society of Ultrasound in Obstetrics and Gynecology, European Society of Gynaecological Oncology, European Society for Gynaecological Endoscopy, and the American College of Radiology.4 13 Also, manufacturers of ultrasound machines have begun to incorporate ADNEX directly into their machines.

Several external validation studies of ADNEX have been carried out. So far, five published systematic reviews and meta-analyses of ADNEX have summarised 3-22 external validation studies.14–18 All of the systematic reviews evaluated ADNEX only as a diagnostic test, reporting a summary sensitivity and specificity at a threshold for the estimated risk of malignancy of 10% or 15%.14–18 The ADNEX model not only classifies masses as benign or malignant, however, it can also be used as a risk prediction model, providing probability estimates for five different tumour types at the individual patient level. Hence focusing only on its performance in classifying tumours, risks losing useful information.19 When validating ADNEX as a diagnostic test at a 10% threshold, the performance metrics (ie, sensitivity and specificity) do not take into account the individual risks predicted but the same weight is given to a misclassified patient with an 11% risk as with a 99% risk. This approach means that these meta-analyses have not fully validated the diagnostic performance of ADNEX; for example, pooling discrimination performance (area under the receiver operating characteristic curve, AUC) allows determination of the ability of the model to differentiate between patients with and without the outcome across different thresholds. Guidelines on how to evaluate the quality and risk of bias of external validation studies of risk prediction models have been produced.20–22 These guidelines should be used in meta-analyses of validation studies. Hence the objectives of this study were to perform a systematic review of studies that externally validated ADNEX, to describe reporting completeness and risk of bias of the validation studies, and to conduct meta-analyses of measures of performance of the model.

Methods

Protocol registration

We report this study according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses, online supplemental file 2) and TRIPOD-SRMA (Transparent Reporting of multivariable prediction models for Individual Prognosis Or Diagnosis: checklist for Systematic Reviews and Meta-Analyses, online supplemental file 3) checklists.22 23

Eligibility criteria

Any study that carried out an external validation to evaluate the performance of the ADNEX model, based on any study design and any study population, was eligible for inclusion in the systematic review. Exclusion criteria were studies that did not evaluate the performance of the ADNEX model; studies that only evaluated the predictive performance of updated versions of ADNEX; studies where only an abstract was available, or the full text could not be obtained; and case studies presenting the performance of ADNEX for individual patients (this criterion was not prespecified in the protocol but was added post hoc on review of the search results). Updating can refer to recalibration, refitting, or extension with additional predictors.24 Studies that conducted comparisons of ADNEX with other models, reporting performance metrics, were eligible for inclusion.

Information sources and search strategy

We created a search string and overall search strategy with the help of biomedical reference librarians from the KU Leuven Libraries. We searched the electronic databases Medline (through PubMed), Embase, Web of Science, and Scopus for published articles, and Europe PMC for preprints. The search dates were from the publication of the first ADNEX paper (15 October 2014) to 15 May 2023 (date when the final search was run). We also screened all articles citing the original ADNEX paper.5 The reference lists of relevant review and opinion articles retrieved by the search strings were checked for other potentially eligible articles. Forward and backward snowballing (forward and back cross reference checking) of the included articles was performed to identify additional publications.25 Language was not restricted, but for papers in languages other than English, Spanish, Dutch, French, or Swedish, we used an automatic translation tool (deepl.com) to decide whether to include a paper and to extract information. Online supplemental material S2 shows the full search strategy.

Study selection

The studies we identified in our search were imported into Zotero reference manager, where they were automatically deduplicated. The deduplicated records were then imported into the Rayyan web application for manual deduplication (by LB) and subsequent screening of the title and abstract by two independent authors. Disagreements were resolved by discussion between the two authors (LB and AL).26

Three of the authors (BVC, LV, and DT) were members of the IOTA group that developed ADNEX, so we divided the studies into those that were linked or not linked to IOTA. A study was linked to IOTA if it was coauthored by a member of the IOTA steering committee (online supplemental material S3). IOTA linked papers, as well as a few others with a potential conflict of interest (ie, including authors that are or were IOTA collaborators), were independently assessed by two of the authors (PD and GSC, medical statisticians with expertise in prediction modelling and unrelated to IOTA). All other studies were independently assessed (by LB and AL). Disagreements were resolved by discussion between reviewers, and for the non-IOTA papers, by discussion with authors BVC, LV, and JYV.

Data extraction and data items

Data were extracted and entered into a standardised data extraction form in Microsoft Excel. Data extraction focused on the general and design characteristics of the studies, target population, reference standard, sample size, performance results, reporting quality, methodological quality, and risk of bias (online supplemental table S1). The extraction form was adapted from the CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) and TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tools, and PROBAST (Prediction model Risk Of Bias Assessment Tool).21 27 28

To describe the performance of the model, we extracted information on any reported measure related to discrimination, calibration, diagnostic accuracy, or clinical utility. The reference standard could be binary (eg, benign v malignant) or multinomial (eg, the five tumour types predicted by ADNEX). Performance data were extracted for all reported validations (ie, for ADNEX with and without CA125), subgroup analyses, sensitivity analyses, and for multicentre studies, results specific to each centre. For each study, we assessed the reporting of all TRIPOD items that were applicable to the external validation studies (online supplemental table S2). We also checked PROBAST’s signalling questions and evaluated risk of bias for each subdomain (participants, predictors, outcome, and analysis) and overall. We included our rationale for classification of the risk of bias.

We contacted study authors to obtain further information or results when centre specific results were not reported in multicentre studies, type of centre was not explicitly reported (if no response, the clinical coauthors, JYV, DT, and LV, classified the centre), overall performance was reported but not performance by menopausal status, or performance was not reported for patients who underwent surgery separately in studies that included patients who were managed surgically and non-surgically. Online supplemental table S1 and Open Science Framework repository (extraction sheet, https://osf.io/jtsvd/) have details on all extracted items.29

Statistical analysis and quantitative data synthesis

Data were summarised with descriptive statistics and data visualisations. Meta-analysis of performance, based on specific results for each centre in multicentre studies, where possible, was conducted with the random effects meta-analysis method. Meta-analysis was done separately for ADNEX with and without CA125. We used 95% confidence intervals for the summary performance and assessed heterogeneity with τ2 and 95% prediction intervals. Online supplemental material S4-6 have details on the statistical methodology, including meta-analysis methods, explanations of net benefit and relative utility, and assessment of publication bias.

Meta-analysis of the AUC for benign versus malignant tumours was done on the logit scale. Meta-analysis for sensitivity and specificity was also performed on the logit scale with a random effects meta-analysis.30 Because the meta-analysis was conducted only for the 10% threshold for the risk of malignancy, we did not need the 95% confidence ellipse in receiver operating characteristic curve space, so we did not use the bivariate random effects model as specified in the protocol. The 10% threshold was most commonly used in the articles included in our systematic review, and is a commonly recommended threshold.4 Meta-analysis of net benefit31 and relative utility32 at the 10% risk of malignancy threshold was performed with bayesian trivariate random effects meta-analyses of sensitivity, specificity, and prevalence of malignancy.33 For bayesian methods, 95% credible intervals are reported instead of 95% confidence intervals. With the bayesian approach, the probability that the model is useful in a new centre can be estimated (ie, the probability that relative utility is >0). To deal with multinomial discrimination performance, we conducted a meta-analysis of AUC values between pairs of tumour outcomes (pairwise AUC values) on a logit scale. We only included studies that used the conditional risk method to calculate pairwise AUC values.34

Subgroups were defined based on geographical location, type of centre, and menopausal status. Sensitivity analyses were based on judgment of the risk of bias and on whether the study was linked to IOTA. As prespecified in the protocol, we only conducted a meta-analysis of performance if at least three estimates in a specific analysis could be retrieved from the included studies. To assess the association between prevalence of malignancy and AUC, and sensitivity and specificity at the 10% risk of malignancy threshold, we used meta-regression.35

Reporting bias and small study effects were visually explored with funnel plots adapted for the AUC. The body of evidence was assessed with an adapted version of GRADE (Grading of Recommendations Assessment, Development and Evaluation).36 All analyses were performed in R version 4.2.2 with the package metamisc for AUC, meta and mada for sensitivity and specificity, and rjags for net benefit and relative utility.37–40 Bayesian methods were computed with JAGS version 4.3.1.40

Patient and public involvement

Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research. Preliminary results of the research were presented at the ISUOG World Congress in Seoul (October 2023). As this study was a systematic review, there was no collection of patients' data, and we use only the information publicly available in the published papers.

Results

We identified 1843 records and screened 490 after duplicates were removed. Forty seven studies met our inclusion criteria and were included in this systematic review (figure 1 and online supplemental table S3).6–9 41–83 Three studies were excluded because the same data were used in another included study, and one study was excluded because a preliminary formula of ADNEX was used.84–87 The data of three studies that were linked to IOTA and three other studies with a potential conflict of interest were extracted by authors PD and GSC.6 50 63 72 74 75

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart of study inclusions and exclusions

Table 1 summarises the key study characteristics and online supplemental table S3 shows the specific details for each study. The unit of analysis was the patient in 42 (89%) studies and the tumour in five (11%) studies. When tumour was the unit of analysis, multiple tumours for the same patient could be included. The 47 studies reported on 17 007 tumours, with a median study sample size of 261 tumours (range 24-4905). Validations were conducted in 28 countries, with most studies conducted in Asia (51%) and Europe (38%). Online supplemental material S7 gives a list of reporting inconsistencies. All studies classified borderline tumours as malignant tumours.

Table 1
Characteristics of 47 included studies

ADNEX with CA125 was validated in 37 (79%) studies, and ADNEX without CA125 in 19 (40%) studies (16 studies evaluated both versions). Three (6%) studies conducted a mixed validation; ADNEX with CA125 was used when CA125 was available. For four (9%) studies, the ADNEX version was unclear. In total, 63 validations of ADNEX were performed after distinguishing between the ADNEX versions used (online supplemental table S4). When reported results for subgroups (eg, by menopausal status or by centre in multicentre studies) were also included, the total number of validations reported for the 47 studies was 159.

Five (11%) studies focused only on a specific clinical subgroup, such as pregnant patients or tumours, and the clinician's subjective assessment of the outcome was uncertain.9 47 61 65 75 Eight (17%) studies selected patients based on histology (online supplemental table S3). Thirty six (77%) studies did not focus on a specific clinical subgroup and did not select tumours based on histology. In these 36 studies that were eligible for the meta-analysis, median sample size was 284 tumours (range 50-4905), median number of malignant tumours was 68 (7-1041), and median prevalence of malignancy was 28% (3–57%). Fourteen of the 36 (39%) studies had ≥100 benign and ≥100 malignant tumours.

The target population of the studies was patients who were managed with surgery, or patients who were managed surgically and non-surgically. The reference standard for determining the type of tumour in patients who underwent surgery was always histopathology. Four (9%) studies included patients who were managed with and without surgery. In patients who did not undergo surgery, the outcome determination was the clinician's subjective assessment of the tumour as benign or malignant, or spontaneous resolution of the tumour during follow-up. The required follow-up time to determine the outcome was 3-4 months, one year, or two years, depending on the study.6–9

The most commonly reported performance measure was AUC for benign versus malignant tumours (72%) (online supplemental table S5). About two thirds of studies (66%) presented a receiver operating characteristic curve, 31 (66%) reported sensitivity and specificity performance at the 10% threshold for the risk of malignancy, 12 (26%) reported measures for multinomial discrimination, and four (9%) studies reported calibration performance.

Critical appraisal: reporting completeness and risk of bias

Completeness of reporting the TRIPOD items was assessed for the 63 validations. Adherence to TRIPOD items was, on average, 61%: studies reported a mean of 16.5 out of 27 items (figure 2 and online supplemental figure S1). The least commonly reported items were comparison of demographics, predictors, and outcome between the model development and external validation data (item 13c; 5%), reporting of performance measures with confidence intervals (item 16; 11%), specification of all performance measures (item 10d; 11%), rationale for study sample size (item 8; 13%), and description of how missing data were handled (item 9; 22%).

Adherence to reporting the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) items in 63 validations

Fifty seven (90%) of 63 validations were rated as having a high risk of bias, two (3%) as uncertain risk of bias, and four (6%) as low risk of bias (figure 3, online supplemental figures S2-5, and Open Science Framework repository, extraction sheet https://osf.io/jtsvd/).29 Forty three (68%) validations had a high risk of bias for the participant domain, mostly by having incomplete data as an exclusion criterion. Fifty seven (90%) validations had a high risk of bias for the analysis domain, mostly because of small sample size (69%; ie, <100 tumours in the smallest group), not including all participants in the analysis (85%), inappropriate handling of missing data (82%), and incomplete evaluation of model performance (89%, in most instances by not reporting an assessment of calibration).

Risk of bias overall and by subdomain in 63 validations, assessed by PROBAST (Prediction model Risk Of Bias Assessment Tool)

Adherence to TRIPOD items in 36 studies without a focus on selected histologies or clinical subgroups was, on average, 65% (17.47 of 27 items). In these studies, two had a low risk of bias, one had an unclear risk of bias, and 33 had a high risk of bias.

Meta-analyses of performance

The meta-analysis included studies without post hoc selection based on histology and without a focus on a clinical subgroup only (n=36) that reported the meta-analysed metrics. Only two studies included in the meta-analysis used tumour as the unit of study.42 51 The AUC for benign versus malignant tumours in patients who underwent surgery was reported in 12 studies (6309 tumours, 31 centres, and 13 countries) for ADNEX without CA125 (online supplemental table S4 and table S6), and the summary AUC was 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) (table 2 and figure 4).6 45 49 53 54 60 66–69 72 83 Twenty one studies (9202 tumours, 43 centres, and 18 countries) reported the AUC for benign versus malignant tumours in patients who underwent surgery for ADNEX with CA1256 41 42 46 49 53 58 60 63 64 67–69 72 74–76 78 80 82 83 (online supplemental table S4 and table S6) and the summary AUC was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) (table 2 and figure 5).

Table 2
Meta-analysis of area under the receiver operating characteristic curve to differentiate between benign and malignant tumours, including sensitivity and subgroup analyses

Forest plot of area under the receiver operating curve in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used without CA125 (cancer antigen 125).6 45 49 53 54 60 66–69 72 83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times. CI=confidence interval

Forest plot of area under the receiver operating characteristic curve in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used with CA125 (cancer antigen 125).6 41 42 46 49 53 58 60 63 64 66–69 72 74 76 78 80 82 83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times. CI=confidence interval

Sensitivity and specificity at the 10% risk of malignancy threshold in patients who underwent surgery were reported in 10 studies for ADNEX without CA125 (online supplemental table S4 and table S7).6 45 49 54 60 66 68 69 72 83 The summary sensitivity and specificity were 0.93 (95% confidence interval 0.90 to 0.95, 95% prediction interval 0.73 to 0.99) and 0.75 (95% confidence interval 0.70 to 0.79, 95% prediction interval 0.46 to 0.91), respectively (figure 6 and online supplemental table S8). For ADNEX with CA125, sensitivity and specificity at the 10% risk of malignancy threshold in patients who underwent surgery were reported in 23 studies (online supplemental table S4 and table S7).6 8 41 45 49 53 58–60 63 66–69 71 72 74 76 78 80–83 The summary sensitivity and specificity were 0.94 (95% confidence interval 0.92 to 0.95, 95% prediction interval 0.80 to 0.98) and 0.77 (95% confidence interval 0.73 to 0.81, 95% prediction interval 0.47 to 0.93), respectively (figure 7 and online supplemental table S8).

Forest plot of sensitivity and specificity at the 10% risk of malignancy threshold in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used without CA125 (cancer antigen 125).6 45 49 54 60 66 68 69 72 83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times. CI=confidence interval

Forest plot of sensitivity and specificity at the 10% risk of malignancy threshold in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used with CA125 (cancer antigen 125).6 8 41 45 49 53 58–60 63 66–69 71 72 74 76 78 80–83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times. CI=confidence interval

Net benefit and relative utility were calculated based on studies that presented sensitivity and specificity at the 10% risk of malignancy threshold in patients who underwent surgery (online supplemental table S7). For ADNEX without CA125, the summary net benefit was 0.28 (95% confidence interval 0.21 to 0.35, 95% prediction interval 0.05 to 0.68) and the summary relative utility was 0.50 (95% confidence interval 0.37 to 0.62, 95% prediction interval −0.44 to 0.79). The probability that the model is clinically useful in a random new centre was estimated to be 91% (online supplemental table S9, figure S6, and figure S7). For ADNEX with CA125, the summary net benefit was 0.28 (95% confidence interval 0.22 to 0.33, 95% prediction interval 0.05 to 0.65), and the summary relative utility was 0.54 (95% confidence interval 0.45 to 0.61, 95% prediction interval −0.12 to 0.78). The probability that the model is clinically useful in a random new centre was estimated to be 95% (online supplemental table S9, figure S8,9).

Pairwise AUC values in patients who underwent surgery were reported in four studies for ADNEX without CA125 and in five studies for ADNEX with CA125 (online supplemental table S10).6 68 72 80 83 The summary pairwise AUC values for ADNEX without CA125 ranged from 0.66 (stage II-IV primary invasive v metastatic) to 0.97 (benign v stage II-IV primary invasive) (online supplemental table S11). For ADNEX with CA125, the summary pairwise AUC values ranged from 0.72 (borderline v stage I primary invasive) to 0.98 (benign v stage II-IV primary invasive).

The AUC for benign versus malignant tumours in patients managed with and without surgery combined was reported in two studies (5167 tumours, 18 centres, and eight countries) with a summary estimate of 0.94 (95% confidence interval 0.93 to 0.96, 95% prediction interval 0.88 to 0.99) for ADNEX with CA125 (table 2).6 7 ADNEX without CA125 was assessed in only one study (4905 tumours, 17 centres, and seven countries).6 This study reported a summary AUC of 0.94 (95% confidence interval 0.91 to 0.95, 95% prediction interval 0.82 to 0.98).

Table 2 and online supplemental tables S6-9 present the sensitivity and subgroup results for AUC, specificity, sensitivity, net benefit, and relative utility. These results showed that the findings were robust (AUC values ranged from 0.90 to 0.95 across all analyses) and clinical utility was suggested in all subgroups. When limiting the analyses to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125). Sensitivity was higher and specificity lower in oncology versus non-oncology centres and in patients who were postmenopausal versus premenopausal. In line with these findings, meta-regression suggested that the prevalence of malignancy was not related to AUC but was related positively to sensitivity and negatively to specificity (online supplemental figures S10-12).

Meta-analysis of calibration only in patients who underwent surgery was not feasible because only one study reported calibration slope and intercept.6 Four studies presented a calibration plot; in three studies72 80 83 the estimated risks were close to the observed risks and in one study6 the risk of malignancy was slightly underestimated (online supplemental material S8).

Based on the subdomains of the GRADE assessment, we found that the risk of bias in the studies included in this meta-analysis was a substantial limitation affecting the certainty of our meta-analysis results. Only two studies (representing 5511 tumours or 32% of the total tumours included in this review) were not classified as having a high risk of bias,6 72 but the sensitivity analysis according to risk of bias showed consistent findings (table 2, and online supplemental table S8 and table S9). Funnel plots for AUC did not suggest publication bias (online supplemental figure S13).

Discussion

Principal findings

The ADNEX model performed well in classifying tumours and in differentiating between benign and malignant tumours across various settings and populations. Our results indicated that ADNEX was clinically useful at the 10% risk of malignancy threshold (eg, to help decide whether a patient should be referred for assessment to a gynaecological oncology centre). We found deficiencies in study reporting, and most studies were judged to have a high risk of bias, but our sensitivity analyses indicated that performance was almost identical in studies with a low risk and high risk of bias.

For ADNEX with CA125, the AUC was 0.93 based on all studies, versus 0.93 when based only on studies with a low risk of bias. For ADNEX without CA125, the AUC values were 0.93 (all studies) and 0.91 (low risk of bias only). High risk of bias was mainly caused by small sample size, no assessment of calibration performance, and unjustified use of complete case analysis. Small sample size implies that the estimated AUC value is less precise but it does not systematically affect the AUC, unless publication bias exists. The funnel plots did not suggest publication bias. Absence of calibration does not affect the AUC. Using complete cases, in terms of CA125 or other predictors, might lead to underestimation of AUC because missing values tend to be associated with the examiner's subjective impression that the tumour is benign.84 Complete case analysis would then tend to exclude clearly benign tumours, which would make the sample more homogeneous and reduce the AUC. Our results, however, suggest that the effect of complete case analysis on performance might have been minimal. The effect on calibration could not be assessed. Taken together, we believe that the results of our meta-analysis are reliable.

Strengths and limitations of this study

Strengths of our systematic review include meta-analysis of ADNEX both as a risk model and as a diagnostic test, and the thorough critical appraisal of risk of bias and reporting quality with recommended checklists.21 28 Our study also had limitations. Some of the authors of this study had a conflict of interest because of their involvement in developing ADNEX or in some of the included external validation studies. To deal with this conflict of interest, independent researchers with expertise in study methodology and prediction modelling evaluated the IOTA related studies. A limitation of our findings (but not of our study) is that calibration performance was reported in only four studies and therefore meta-analysis of calibration was not possible.

Comparison with other studies

Previous systematic reviews have conducted meta-analyses of the diagnostic performance of ADNEX with CA125.14–18 These studies used the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool88 to assess risk of bias, and found that 0-64% of studies had a high risk of bias in at least one domain. We identified 45 of 47 studies with a high risk of bias with PROBAST, designed for the appraisal of risk prediction models. None of the previous meta-analyses of ADNEX included clinical utility, calibration, or AUC. Our results align with those of other systematic reviews of risk prediction modelling studies in various domains. These studies consistently indicated that reporting in the original studies was poor and that many studies had a high risk of bias.89–95 The results in terms of sensitivity and specificity of ADNEX in our meta-analysis, however, were similar to those in the other meta-analyses of ADNEX performance.

Study implications

ADNEX is intended for use by gynaecologists to help them decide on the most appropriate management of an adnexal mass detected on ultrasound. Our findings support the use of ADNEX in choosing between surgery and conservative follow-up. Conservative follow-up might be appropriate in patients with a low risk of malignancy (eg, <1%, based on meta-analysis of patients managed with and without surgery). Our results also support the use of ADNEX in deciding on the most appropriate management if conservative management is not suitable (eg, surgery at a local hospital if the risk of malignancy is <10% or referral to an oncology centre if the risk is>10%; based on meta-analysis of patients who underwent surgery).4

ADNEX can also be helpful in deciding on the management of a suspected malignancy (ie, investigations to find the primary tumour if a metastasis in the ovary is likely, or fertility sparing surgery if a borderline tumour is likely; based on meta-analysis in patients who underwent surgery). Because the AUC values of ADNEX with and without CA125 were similar, and because adding CA125 mainly helps to distinguish between different types of malignant tumours, we argue that the main use of ADNEX without CA125 is to help decide whether conservative follow-up, surgery in a local centre, or referral to an oncology centre is appropriate. The main use of ADNEX with CA125 is to help decide on the optimal management of a tumour suspected to be malignant, because it differentiates better between malignant subtypes than ADNEX without CA125.

Although our findings suggest that ADNEX is clinically useful, well conducted validations of any model are always of value to monitor its performance in diverse regions and clinical settings, and over time.96 To improve the performance of ADNEX even more, efforts to update the ADNEX formula are of interest.96 97 If further validation studies are conducted, we recommend including a validation of ADNEX without CA125, using a sufficiently large sample that allows calibration and multinomial discrimination to be assessed, and including patients irrespective of whether they are managed surgically or non-surgically (despite the challenges about reference standard for patients managed without surgery).34 98 Methodological recommendations for validation studies include using available tools to guarantee adequate sample size, describe missing data in detail and use methods such as imputation when needed, and assess calibration performance.24 97 99–101 Adherence to the TRIPOD reporting checklists is important to maximise the value of the validation study (www.tripod-statement.org).

Conclusions

ADNEX has been validated in many studies, with AUC values >0.90 in differentiating between benign and malignant tumours in various settings, and with strong results for its clinical utility at the 10% risk of malignancy threshold. Because of the lack of assessment of calibration in most studies, evaluating the accuracy of the estimated risks in a substantial way was not possible in this study.

Ethics approval

Ethics approval was not required for the systematic review and meta-analysis.

  • X: @BarrenadaLasai

  • Contributors: Contributions were based on the CRediT taxonomy. LB contributed to conceptualisation, project administration, project administration, methodology, resources, investigation, validation, data curation, software, formal analysis, visualisation, writing of the original draft of the manuscript, and review and editing of the manuscript; AL contributed to resources, investigation, validation, data curation, and review and editing of the manuscript; PD contributed to validation, and review and editing of the manuscript; GC contributed to methodology, validation, and review and editing of the manuscript; LW contributed to methodology, software, formal analysis, and review and editing of the manuscript; JYV contributed to methodology, data curation, and review and editing of the manuscript; DT contributed to funding acquisition, and review and editing of the manuscript; LV contributed to investigation, data curation and review and editing of the manuscript; BVC contributed to conceptualisation, funding acquisition, supervision, methodology, investigation, validation, data curation, software, formal analysis, writing of the original draft of the manuscript, and review and editing of the manuscript. All authors have read, share final responsibility for the decision to submit for publication, and agree to be accountable for all aspects of the work. BVC is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Transparency statement: The lead author affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Funding: This research was supported by the Research Foundation Flanders (FWO, grant G097322N) with BVC and DT as supervisors. LW and BVC were supported by Internal Funds KU Leuven (grant C24M/20/064). JYV was supported by the National Institute for Health and Care Research (NIHR) Community Healthcare MedTech and In Vitro Diagnostics Co-operative at Oxford Health NHS Foundation Trust. The views expressed are those of the authors and not necessarily those of the NHS, NIHR, or Department of Health and Social Care. GC is supported by Cancer Research UK (programme grant C49297/A27294). PD is supported by CRUK (project grant PRCPJT-Nov21\100021). The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from Research Foundation Flanders (FWO), Internal Funds KU Leuven, National Institute for Health and Care Research Community Healthcare MedTech, In Vitro Diagnostics Co-operative at Oxford Health NHS Foundation Trust, Cancer Research UK, and CRUK for the submitted work; BVC, LV, and DT are members of the steering committee of the International Ovarian Tumour Analysis (IOTA) consortium and were involved in the development of the ADNEX model; BVC and DT report consultancy work done by KU Leuven to help implement and test the ADNEX model in ultrasound machines by Samsung Medison, GE Healthcare, Canon Medical Systems Europe, and Shenzhen Mindray Bio-medical Electronics, outside the submitted work; GC is a statistics editor for the BMJ; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review: Not commissioned; externally peer reviewed.

  • Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information. Some data were provided by the authors and were not public information; therefore this information was omitted from the public repository.

Acknowledgements

We thank Thomas Vandendriessche, Chayenne Van Meel, and Krizia Tuand, the biomedical reference librarians of the KU Leuven Libraries (2Bergen, Learning Centre Désiré Collen, Leuven, Belgium) for their help in conducting the systematic literature search. We gratefully acknowledge the authors of the external validation of ADNEX for providing their valuable data.

  1. close Carley ME, Klingele CJ, Gebhart JB, et al. Laparoscopy versus laparotomy in the management of benign unilateral adnexal masses. J Am Assoc Gynecol Laparosc 2002; 9:321–6.
  2. close Froyman W, Landolfo C, De Cock B, et al. Risk of complications in patients with conservatively managed ovarian tumours (IOTA5): a 2-year interim analysis of a multicentre, prospective, cohort study. Lancet Oncol 2019; 20:448–58.
  3. close Woo YL, Kyrgiou M, Bryant A, et al. Centralisation of services for gynaecological cancers - a Cochrane systematic review. Gynecol Oncol 2012; 126:286–90.
  4. close Timmerman D, Planchamp F, Bourne T, et al. ESGO/ISUOG/IOTA/ESGE consensus statement on preoperative diagnosis of ovarian tumors. Ultrasound Obstet Gynecol 2021; 58:148–68.
  5. close Van Calster B, Van Hoorde K, Valentin L, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ 2014; 349.
  6. close Van Calster B, Valentin L, Froyman W, et al. Validation of models to diagnose ovarian cancer in patients managed surgically or conservatively: multicentre cohort study. BMJ 2020; 370.
  7. close Hack K, Gandhi N, Bouchard-Fortier G, et al. External validation of O-RADS US risk stratification and management system. Radiology 2022; 304:114–20.
  8. close Jeong SY, Park BK, Lee YY, et al. Validation of IOTA-ADNEX model in discriminating characteristics of adnexal masses: a comparison with subjective assessment. J Clin Med 2020; 9.
  9. close Quaranta M, Nath R, Mehra G, et al. Surgery of benign ovarian masses by a gynecological cancer surgeon: a cohort study in a tertiary cancer centre. Cureus 2020; 12.
  10. close Regionala cancercentrum i samverkan. Äggstockscancer MED Epitelial Histologi. 2023;
    Available: here [Accessed 14 Jun 2023]
  11. close Ignace V, Joan V, Pauline H, et al. Eierstokkanker: diagnose, Behandeling en follow-up. Brussel, Federaal Kenniscentrum voor de Gezondheidszorg (KCE) 2016;
  12. close Geomini P. A. ACCEPT (accuracy, cost effectivenss of prediction models for ovarian tumors): a study on the cost-effectiveness of risk scoring models for the discrimination between benign or malignant ovarian tumors, ZonMw Proj. 2021;
    Available: here [Accessed 3 Jul 2023]
  13. close Andreotti RF, Timmerman D, Strachowski LM, et al. O-RADS US risk stratification and management system: a consensus guideline from the ACR ovarian-adnexal reporting and data system committee. Radiology 2020; 294:168–85.
  14. close Cui L, Xu H, Zhang Y, et al. Diagnostic accuracies of the ultrasound and magnetic resonance imaging ADNEX scoring systems for ovarian adnexal mass: systematic review and meta-analysis. Acad Radiol 2022; 29:897–908.
  15. close Davenport C, Rai N, Sharma P, et al. Menopausal status, ultrasound and biomarker tests in combination for the diagnosis of ovarian cancer in symptomatic women. Cochrane Database Syst Rev 2022; 7.
  16. close Huang X, Wang Z, Zhang M, et al. Diagnostic accuracy of the ADNEX model for ovarian cancer at the 15% cut-off value: a systematic review and meta-analysis. Front Oncol 2021; 11:684257.
  17. close Westwood M, Ramaekers B, Lang S, et al. Risk scores to guide referral decisions for people with suspected ovarian cancer in secondary care: a systematic review and cost-effectiveness analysis. Health Technol Assess 2018; 22:1–264.
  18. close Yue X, Zhong L, Wang Y, et al. Value of assessment of different neoplasias in the adnexa in the differential diagnosis of malignant ovarian tumor and benign ovarian tumor: a meta-analysis. Ultrasound Med Biol 2022; 48:730–42.
  19. close Wynants L, van Smeden M, McLernon DJ, et al. Three myths about risk thresholds for prediction models. BMC Med 2019; 17.
  20. close Debray TPA, Damen JAAG, Snell KIE, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017; 356.
  21. close Wolff RF, Moons KGM, Riley RD, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170:51–8.
  22. close Snell KIE, Levis B, Damen JAA, et al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ 2023; 381.
  23. close Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021; 372.
  24. close Steyerberg EW. Clinical prediction models. Stat Health Sci 2009;
    Available: here
  25. close Wohlin C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE ’14. London, ACM Press 2014;
  26. close Ouzzani M, Hammady H, Fedorowicz Z, et al. Rayyan-a web and mobile app for systematic reviews. Syst Rev 2016; 5.
  27. close Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med 2014; 11.
  28. close Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015; 350.
  29. close Barreñada L, Ledger A, Collins G, et al. The ADNEX risk model for ovarian cancer diagnosis: a systematic review and meta-analysis of external validation studies. Dataset 2023;
  30. close Riley RD, Abrams KR, Sutton AJ, et al. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007; 7.
  31. close Vickers AJ, Elkin EB. Decision curve analysis: A novel method for evaluating prediction models. Med Decis Making 2006; 26:565–74.
  32. close Baker SG. Putting risk prediction in perspective: relative utility curves. JNCI 2009; 101:1538–42.
  33. close Wynants L, Riley RD, Timmerman D, et al. Random-effects meta-analysis of the clinical utility of tests and prediction models. Stat Med 2018; 37:2034–52.
  34. close Van Calster B, Vergouwe Y, Looman CWN, et al. Assessing the discriminative ability of risk models for more than two outcome categories. Eur J Epidemiol 2012; 27:761–70.
  35. close Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Stat Med 2002; 21:1559–73.
  36. close Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008; 336:924–6.
  37. close Debray TP, Damen JA, Riley RD, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res 2019; 28:2768–86.
  38. close Schwarzer G, Carpenter JR, Rücker G, et al. Meta-Analysis with R. Cham, Springer International Publishing 2015;
  39. close Doebler P, Holling H. Meta-analysis of diagnostic accuracy with Mada. 2017;
    Available: here
  40. close Plummer M. Rjags: Bayesian graphical models using MCMC. 2022;
    Available: here
  41. close Araujo KG, Jales RM, Pereira PN, et al. Performance of the IOTA ADNEX model in preoperative discrimination of adnexal masses in a gynecological oncology center. Ultrasound Obstet Gynecol 2017; 49:778–83.
  42. close Behnamfar F, Esmaeilian F, Adibi A, et al. Comparison of ultrasound and tumor marker CA125 in diagnosis of adnexal mass malignancies. Adv Biomed Res 2022; 11.
  43. close Budiana ING, Suwiyoga K, Suwardewa TGA, et al. Skor assessment of different neoplasias in the adnexa (ADNEX) untuk memprediksi keganasan ovarium di RSUP Sanglah Denpasar. Intisari Sains Medis 2022; 13:197–201.
  44. close Butureanu T, Socolov D, Matasariu DR, et al. Ovarian masses-applicable IOTA ADNEX model versus morphological findings for accurate diagnosis and treatment. Applied Sciences 2021; 11:10789.
  45. close Chen G-Y, Hsu T-F, Chan I-S, et al. Comparison of the O-RADS and ADNEX models regarding malignancy rate and validity in evaluating adnexal lesions. Eur Radiol 2022; 32:7854–64.
  46. close Chen H, Qian L, Jiang M, et al. Performance of IOTA ADNEX model in evaluating adnexal masses in a gynecological oncology center in China. Ultrasound Obstet Gynecol 2019; 54:815–22.
  47. close Czekierdowski A, Stachowicz N, Smoleń A, et al. Sonographic assessment of complex ultrasound morphology adnexal tumors in pregnant women with the use of IOTA simple rules risk and ADNEX scoring systems. Diagnostics 2021; 11:414.
  48. close Czekierdowski A, Stachowicz N, Smolen A, et al. Performance of IOTA simple rules risks, ADNEX model, subjective assessment compared to CA125 and HE4 with ROMA algorithm in discriminating between Benign, Borderline and Stage I Malignant Adnexal Lesions. Diagnostics (Basel) 2023; 13.
  49. close Díaz L, Santos M, Zambrano B, et al. Ovarian tumors: risk of malignancy and IOTA ADNEX model indexes. No technology Doppler diagnostic options [Tumores de Ovario: Índices de Riesgo de Malignidad Y Modelo ADNEX de IOTA. Opciones Diagnósticas sin Tecnología Doppler]. Rev Obstet Ginecol Venez 2017; 77:181–95.
  50. close Epstein E, Van Calster B, Timmerman D, et al. Subjective ultrasound assessment, the ADNEX model and ultrasound-guided tru-cut biopsy to differentiate disseminated primary ovarian cancer from metastatic non-ovarian cancer. Ultrasound Obstet Gynecol 2016; 47:110–6.
  51. close Esquivel Villabona AL, Rodríguez JN, Ayala N, et al. Two-step strategy for optimizing the preoperative classification of adnexal masses in a university hospital, using international ovarian tumor analysis models: simple rules and assessment of different neoplasias in the adnexa model. J Ultrasound Med 2022; 41:471–82.
  52. close Gaurilcikas A, Gedgaudaite M, Cizauskas A, et al. Performance of the IOTA ADNEX model on selected group of patients with borderline ovarian tumours. Medicina 2020; 56:690.
  53. close He P, Wang J-J, Duan W, et al. Estimating the risk of malignancy of adnexal masses: validation of the ADNEX model in the hands of nonexpert ultrasonographers in a gynaecological oncology centre in China. J Ovarian Res 2021; 14.
  54. close Hiett AK, Sonek JD, Guy M, et al. Performance of IOTA simple rules, simple rules risk assessment, ADNEX model and O-RADS in differentiating between benign and malignant adnexal lesions in North American women. Ultrasound Obstet Gynecol 2022; 59:668–76.
  55. close Hu Y, Chen B, Dong H, et al. Comparison of ultrasound−based ADNEX model with magnetic resonance imaging for discriminating adnexal masses: a multi-center study. Front Oncol 2023; 13:1101297.
  56. close Jiang M-J, Le Q, Yang B-W, et al. Ovarian sex cord stromal tumours: analysis of the clinical and sonographic characteristics of different histopathologic subtypes. J Ovarian Res 2021; 14.
  57. close Jianhong S, Lei T, Wu L, et al. Comparison of performance between O-RADS, IOTA simple rules risk assessment and ADNEX model in the discrimination of ovarian brenner tumors. In Review 2022;
  58. close Joyeux E, Miras T, Masquin I, et al. Before surgery predictability of malignant ovarian tumors based on ADNEX model and its use in clinical practice [Prédictibilité préopératoire de la malignité des tumeurs ovariennes à partir du score ADNEX et utilisation en pratique clinique]. Gynecol Obstet Fertil 2016; 44:557–64.
  59. close Lai H-W, Lyu G-R, Kang Z, et al. Comparison of O-RADS, GI-RADS, and ADNEX for diagnosis of adnexal masses: an external validation study conducted by junior sonologists. J Ultrasound Med 2022; 41:1497–507.
  60. close Lam Huong L, Thi Phuong Dung N, Hoang Lam V, et al. The optimal cut-off point of the ADNEX model for the prediction of the ovarian cancer risk. Asian Pac J Cancer Prev 2022; 23:2713–8.
  61. close Lee SJ, Kim Y-H, Lee M-Y, et al. Ultrasonographic evaluation of ovarian mass for predicting malignancy in pregnant women. Gynecol Oncol 2021; 163:385–91.
  62. close Liu B, Liao J, Gu W, et al. ADNEX model-based diagnosis of ovarian cancer using MRI images. Contrast Media Mol Imaging 2021; 2021.
  63. close Meys EMJ, Jeelof LS, Achten NMJ, et al. Estimating risk of malignancy in adnexal masses: external validation of the ADNEX model and comparison with other frequently used ultrasound methods. Ultrasound Obstet Gynecol 2017; 49:784–92.
  64. close Nam G, Lee SR, Jeong K, et al. Assessment of different neoplasias in the Adnexa model for differentiation of benign and malignant adnexal masses in Korean women. Obstet Gynecol Sci 2021; 64:293–9.
  65. close Nohuz E, De Simone L, Chêne G, et al. Reliability of IOTA score and ADNEX model in the screening of ovarian malignancy in postmenopausal women. J Gynecol Obstet Hum Reprod 2019; 48:103–7.
  66. close Pelayo M, Pelayo-Delgado I, Sancho-Sauco J, et al. Comparison of ultrasound scores in differentiating between benign and malignant adnexal masses. Diagnostics 2023; 13:1307.
  67. close Peng X-S, Ma Y, Wang L-L, et al. Evaluation of the diagnostic value of the ultrasound ADNEX model for benign and malignant ovarian tumors. Int J Gen Med 2021; 14:5665–73.
  68. close Poonyakanok V, Tanmahasamut P, Jaishuen A, et al. Preoperative evaluation of the adnex model for the prediction of the ovarian cancer risk of adnexal masses at Siriraj Hospital. Gynecol Obstet Invest 2021; 86:132–8.
  69. close Qian L, Du Q, Jiang M, et al. Comparison of the diagnostic performances of ultrasound-based models for predicting malignancy in patients with adnexal masses. Front Oncol 2021; 11.
  70. close Rashmi N, Singh S, Begum J, et al. Diagnostic performance of ultrasound-based international ovarian tumor analysis simple rules and assessment of different neoplasias in the adnexa model for predicting malignancy in women with ovarian tumors: a prospective cohort study. Womens Health Rep (New Rochelle) 2023; 4:202–10.
  71. close Sandal K, Polat M, Yassa M, et al. Comparision of ‘risk of malignancy indices’ and ‘assesment of different neoplasia in the adnexa’ (ADNEX) model as preoperative malignancy evaluation methods for adnexal masses. Zeynep Kamil Tip Bul 2018; 49:324–9.
  72. close Sayasneh A, Ferrara L, De Cock B, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model: a multicentre external validation study. Br J Cancer 2016; 115:542–8.
  73. close Stukan M, Alcazar JL, Gębicki J, et al. Ultrasound and clinical preoperative characteristics for discrimination between ovarian metastatic colorectal cancer and primary ovarian cancer: a case-control study. Diagnostics 2019; 9:210.
  74. close Stukan M, Badocha M, Ratajczak K, et al. Development and validation of a model that includes two ultrasound parameters and the plasma D-dimer level for predicting malignancy in adnexal masses: an observational study. BMC Cancer 2019; 19.
  75. close Szubert S, Szpurek D, Wójtowicz A, et al. Performance of selected models for predicting malignancy in ovarian tumors in relation to the degree of diagnostic uncertainty by subjective assessment with ultrasound. J Ultrasound Med 2020; 39:939–47.
  76. close Szubert S, Wojtowicz A, Moszynski R, et al. External validation of the IOTA ADNEX model performed by two independent gynecologic centers. Gynecol Oncol 2016; 142:490–5.
  77. close Tavoraitė I, Kronlachner L, Opolskienė G, et al. Ultrasound assessment of adnexal pathology: standardized methods and different levels of experience. Medicina 2021; 57:708.
  78. close Tug N, Yassa M, Sargin MA, et al. Preoperative discriminating performance of the IOTA-ADNEX model and comparison with Risk of Malignancy Index: an external validation in a non-gynecologic oncology tertiary center. EJGO 2020; 41:200.
  79. close Velayo CL, Reforma KN, Sicam RVG, et al. Diagnostic performances of ultrasound-based models for predicting malignancy in patients with adnexal masses. Healthcare 2022; 11:8.
  80. close Viora E, Piovano E, Baima Poma C, et al. The ADNEX model to triage adnexal masses: an external validation study and comparison with the IOTA two-step strategy and subjective assessment by an experienced ultrasound operator. Eur J Obstet Gynecol Reprod Biol 2020; 247:207–11.
  81. close Wang R, Yang Z. Evaluating the risk of malignancy in adnexal masses: validation of O-RADS and comparison with ADNEX model, SA, and RMI. Ginekol Pol 2023; 94:799–806.
  82. close Yang S, Tang J, Rong Y, et al. Performance of the IOTA ADNEX model combined with HE4 for identifying early-stage ovarian cancer. Front Oncol 2022; 12:949766.
  83. close Zhang Y, Zhao Y, Feng L, et al. External validation of the assessment of different neoplasias in the adnexa model performance in evaluating the risk of ovarian carcinoma before surgery in China: a tertiary center study. J Ultrasound Med 2022; 41:2333–42.
  84. close Landolfo C, Bourne T, Froyman W, et al. Benign descriptors and ADNEX in two-step strategy to estimate risk of malignancy in ovarian tumors: retrospective validation in IOTA5 multicenter cohort. Ultrasound Obstet Gynecol 2023; 61:231–42.
  85. close Poonyakanok V, Tanmahasamut P, Jaishuen A, et al. Prospective comparative trial comparing O-RADS, IOTA ADNEX model, and RMI score for preoperative evaluation of adnexal masses for prediction of ovarian cancer. J Obstet Gynaecol Res 2023; 49:1412–7.
  86. close Froyman W, Wynants L, Landolfo C, et al. Validation of the performance of international ovarian tumor analysis (IOTA) methods in the diagnosis of early stage ovarian cancer in a non-screening population. Diagnostics 2017; 7:32.
  87. close Wynants L, Timmerman D, Verbakel JY, et al. Clinical utility of risk models to refer patients with adnexal masses to specialized oncology care: multicenter external validation using decision curve analysis. Clin Cancer Res 2017; 23:5082–90.
  88. close Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011; 155:529–36.
  89. close Collins GS, de Groot JA, Dutton S, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol 2014; 14.
  90. close Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020; 369.
  91. close Helmrich IRAR, Mikolić A, Kent DM, et al. Does poor methodological quality of prediction modeling studies translate to poor model performance? An illustration in traumatic brain injury. Diagn Progn Res 2022; 6.
  92. close Dhiman P, Ma J, Navarro CA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. J Clin Epidemiol 2021; 138:60–72.
  93. close Andaur Navarro CL, Damen JAA, Takada T, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 2021; 375.
  94. close Allotey J, Whittle R, Snell KIE, et al. External validation of prognostic models to predict stillbirth using International Prediction of Pregnancy Complications (IPPIC) Network database: individual participant data meta-analysis. Ultrasound Obstet Gynecol 2022; 59:209–19.
  95. close Bouwmeester W, Zuithoff NPA, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9.
  96. close Van Calster B, Steyerberg EW, Wynants L, et al. There is no such thing as a validated prediction model. BMC Med 2023; 21.
  97. close Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med 2019; 17.
  98. close Van Hoorde K, Vergouwe Y, Timmerman D, et al. Assessing calibration of multinomial risk prediction models. Stat Med 2014; 33:2585–96.
  99. close Pate A, Riley RD, Collins GS, et al. Minimum sample size for developing a multivariable prediction model using multinomial logistic regression. Stat Methods Med Res 2023; 32:555–71.
  100. close Riley RD, Debray TPA, Collins GS, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med 2021; 40:4230–51.
  101. close Sisk R, Sperrin M, Peek N, et al. Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: a simulation study. Stat Methods Med Res 2023; 32:1461–77.

  • Received: 17 November 2023
  • Accepted: 25 January 2024
  • First published: 17 February 2024