Table 4

Rating of concepts by Delphi respondents in round 1 and 2

Domains	Concepts related to risk of bias in network meta-analysis	Round	Included (70% consensus)	Responses						Group median (Q1, Q3)
Domains	Concepts related to risk of bias in network meta-analysis	Round	Included (70% consensus)	Strongly disagree	Disagree	Neither agree nor disagree	Agree	Strongly agree	Unable to Score	Group median (Q1, Q3)
Network characteristics/ geometry	1: Whether all interventions in the network (including comparators) were potentially suitable for all eligible study respondents	1	Y	0/27	0/27	2/27 (7%)	10/27 (37%)	15/27 (56%)	1	5 (4, 5)
	2: Whether any interventions were inappropriately excluded from the network (eg, through eligibility criteria or after seeing the results)	1	Y	0/27	1/27 (4%)	3/27 (11%)	11/27 (41%)	12/27 (44%)	1	4 (4, 5)
	3: Whether importantly different intervention strategies were kept as distinct nodes in the network (ie, whether appropriate groupings were made of interventions—lumping vs splitting)	1	Y	0/28	2/28 (7%)	5/28 (18%)	11/28 (39%)	9/28 (32%)	0	4 (3, 5)
Effect modifiers	4: Whether effect-modifying participant characteristics are sufficiently similar across the whole network	1	Y	0/28	0/28	2/28 (7%)	11/28 (39%)	15/28 (53%)	0	4.5 (4, 5)
	5: Whether outcomes and time points are sufficiently similar across the whole network	1	Y	0/28	0/28	4/28 (14%)	14/28 (50%)	10/28 (36%)	0	4 (4, 5)
	6: Whether study-level risks of bias are sufficiently similar across the whole network	1	N	0/28	9/28 (32%)	5/28 (18%)	8/28 (29%)	5/28 (18%)	0	3 (2, 4)
		2	N	2/22 (9%)	8/22 (36%)	5/22 (23%)	5/22 (23%)	2/22 (9%)	0	3 (2, 4)
	7: Whether other trial characteristics are sufficiently similar across the whole network	1	N	0/24	4/24 (17%)	9/24 (38%)	7/24 (29%)	4/24 (17%)	4	3 (3, 4)
		2	N	0/21	4/21 (19%)	8/21 (38%)	8/21 (38%)	1/21 (5%)	1	3 (3, 4)
Statistical synthesis	8: Whether an appropriate prespecified approach was used in node making	1	Y	1/25 (4%)	1/25 (4%)	5/25 (20%)	13/25 (52%)	5/25 (20%)	3	4 (3, 4)
	9: Whether a process was used to define nodes in the network (eg, undertaken independently by two reviewers, following a preplanned node-making process)	1	N	0/25	2/25 (8%)	11/25 (44%)	9/25 (36%)	3/25 (12%)	3	3 (3, 4)
		2	N	1/20 (5%)	1/20 (5%)	5/20 (25%)	11/20 (55%)	2/20 (10%)	2	4 (3, 4)
	10: Whether effect metric(s) for each outcome (eg, ORs, risk difference) in the network were presented with CIs/credible intervals	1	N	3/28 (11%)	4/28 (14%)	2/28 (7%)	6/28 (21%)	13/28 (46%)	0	4 (2.25, 5)
		2	N	3/22 (14%)	7/22 (32%)	2/22 (9%)	2/22 (9%)	8/22 (36%)	0	3 (2, 5)
	11: If disconnected networks were connected to perform the analysis, whether methods to do this were appropriate	1	Y	0/25	2/25 (8%)	4/25 (16%)	13/25 (52%)	6/25 (24%)	3	4 (3.5, 4)
	12: Whether methods used to represent multi-arm studies in the dataset and in the analysis are appropriate	1	Y	0/28	4/28 (14%)	3/28 (11%)	10/28 (36%)	11/28 (39%)	0	4 (3.25, 5)
	13: Whether assumptions across the network about homogeneity/heterogeneity of effects within comparisons are appropriate	1	Y	0/27	1/27 (4%)	6/27 (22%)	13/27 (48%)	7/27 (26%)	1	4 (3, 4)
	14: Whether a valid approach was used to determine whether there was conflict between direct and indirect sources of evidence on the same comparisons (often called inconsistency or incoherence)	1	Y	0/28	2/28 (7%)	0/28	8/28 (29%)	18/28 (64%)	0	5 (4, 5)
	15: If inconsistency detected, then whether methods such as re-evaluation of the choice of scale, effect modification and similarity of the contributing randomised controlled trials were investigated	1	N	0/28	4/28 (14%)	5/28 (18%)	14/28 (50%)	5/28 (18%)	0	4 (3, 4)
		2	N	0/22	4/22 (18%)	4/22 (18%)	9/22 (41%)	5/22 (23%)	0	4 (3, 4)
	16: If a Bayesian analysis was conducted, whether the selection of prior distributions was justified	1	Y	0/28	3/28 (11%)	2/28 (7%)	13/28 (46%)	10/28 (36%)	0	4 (4, 5)
	17: Whether the analysis appropriately addressed any differences in effect modifiers across different parts of the network	1	Y	0/28	2/28 (7%)	3/28 (11%)	18/28 (64%)	5/28 (18%)	0	4 (4, 4)
	18: Whether there was evidence of conflicting results between direct and indirect evidence (often called inconsistency and incoherence in results)	1	Y	0/28	3/28 (11%)	0/28	10/28 (36%)	14/28 (50%)	0	4 (4, 5)
	19: If there were conflicting results between direct and indirect evidence was this addressed appropriately (eg, meta-regression, data extraction errors, redefining the network)	1	Y	0/27	2/27 (7.4%)	1/27 (4%)	13/27 (48%)	11/27 (41%)	1	4 (4, 5)
	20: Evidence that the statistical model, as it was used to get the key results, was not suitable for the data (eg, from analysis of residuals or information criteria such as DIC)	1	N	0/28	7/28 (25%)	6/28 (21%)	9/28 (32%)	6/28 (21%)	0	3.5 (2.25, 4)
		2	N	0/14	1/21 (5%)	9/21 (43%)	7/21 (33%)	4/21 (19%)	1	4 (3, 4)
	21: Whether sensitivity analyses demonstrate that findings were robust to the statistical model and estimation methods (including prior distributions if Bayesian methods were used)	1	N	0/28	2/28 (7%)	9/28 (32%)	8/28 (29%)	9/28 (32%)	0	4 (3, 4.75)
		2	N	0/22	1/22 (5%)	8/22 (36%)	7/22 (32%)	6/22 (27%)	0	4 (3, 4.75)
	22: Whether limitations at study and outcome level (eg, risk of bias), and at review level (eg, incomplete retrieval of identified research, reporting bias) were discussed	1	Y	0/27	1/27 (4%)	4/27 (15%)	8/27 (30%)	13/27 (48%)	1	4 (4, 5)

*Blank responses and ‘unable to score’ responses were not counted in the denominator.
DIC, deviance information criterion; N, no; Y, yes.