TY - UNPB
T1 - Same data, different analysts
T2 - variation in effect sizes due to analytical decisions in ecology and evolutionary biology
AU - Gould, Elliot
AU - Fraser, Hannah
AU - Parker, Timothy
AU - Nakagawa, Shinichi
AU - Griffith, Simon C.
AU - Vesk, Peter
AU - Fidler, Fiona
AU - Abbey-Lee, Robin
AU - Abbott, Jessica
AU - Aguirre, Luis
AU - Alcaraz, Carles
AU - Aloni, Irith
AU - Altschul, Drew
AU - Arekar, Kunal
AU - Atkins, Jeff W.
AU - Atkinson, Joe
AU - Baker, Christopher M.
AU - Barrett, Meghan
AU - Bell, Kristian
AU - Bello, Suleiman Kehinde
AU - Berauer, Bernd
AU - Bertram, Michael Grant
AU - Billman, Peter
AU - Blake, Charlie
AU - Blake, Shannon
AU - Bliard, Louis
AU - Bonisoli-Alquati, Andrea
AU - Bonnet, Timothée
AU - Bordes, Camille Nina Marion
AU - Bose, Aneesh
AU - Botterill-James, Thomas
AU - Boyd, Melissa Anna
AU - Boyle, Sarah A.
AU - Bradfer-Lawrence, Tom
AU - Bradham, Jennifer
AU - Brand, Jack
AU - Brengdahl, Martin I.
AU - Bulla, Martin
AU - Bussière, Luc
AU - Camerlenghi, Ettore
AU - Campbell, Sara
AU - Campos, Leonardo L. F.
AU - Caravaggi, Anthony
AU - Cardoso, Pedro
AU - Carroll, Charles J. W.
AU - Catanach, Therese A.
AU - Chen, Xuan
AU - Chik, Heung Ying Janet
AU - Choy, Emily Sarah
AU - Christie, Alec Philip
AU - Chuang, Angela
AU - Chunco, Amanda
AU - Clark, Bethany
AU - Contina, Andrea
AU - Covernton, Garth A.
AU - Cox, Murray P.
PY - 2023/10/3
Y1 - 2023/10/3
N2 - Although variation in effect sizes and predicted values among studies of similar phenomena is inevitable, such variation far exceeds what might be produced by sampling error alone. One possible explanation for variation among results is differences among researchers in the decisions they make regarding statistical analyses. A growing array of studies has explored this analytical variability in different (mostly social science) fields, and has found substantial variability among results, despite analysts having the same data and research question. We implemented an analogous study in ecology and evolutionary biology, fields in which there have been no empirical exploration of the variation in effect sizes or model predictions generated by the analytical decisions of different researchers. We used two unpublished datasets, one from evolutionary ecology (blue tit, Cyanistes caeruleus, to compare sibling number and nestling growth) and one from conservation ecology (Eucalyptus, to compare grass cover and tree seedling recruitment), and the project leaders recruited 174 analyst teams, comprising 246 analysts, to investigate the answers to prespecified research questions. Analyses conducted by these teams yielded 141 usable effects for the blue tit dataset, and 85 usable effects for the Eucalyptus dataset. We found substantial heterogeneity among results for both datasets, although the patterns of variation differed between them. For the blue tit analyses, the average effect was convincingly negative, with less growth for nestlings living with more siblings, but there was near continuous variation in effect size from large negative effects to effects near zero, and even effects crossing the traditional threshold of statistical significance in the opposite direction. In contrast, the average relationship between grass cover and Eucalyptus seedling number was only slightly negative and not convincingly different from zero, and most effects ranged from weakly negative to weakly positive, with about a third of effects crossing the traditional threshold of significance in one direction or the other. However, there were also several striking outliers in the Eucalyptus dataset, with effects far from zero. For both datasets, we found substantial variation in the variable selection and random effects structures among analyses, as well as in the ratings of the analytical methods by peer reviewers, but we found no strong relationship between any of these and deviation from the meta-analytic mean. In other words, analyses with results that were far from the mean were no more or less likely to have dissimilar variable sets, use random effects in their models, or receive poor peer reviews than those analyses that found results that were close to the mean. The existence of substantial variability among analysis outcomes raises important questions about how ecologists and evolutionary biologists should interpret published results, and how they should conduct analyses in the future.
AB - Although variation in effect sizes and predicted values among studies of similar phenomena is inevitable, such variation far exceeds what might be produced by sampling error alone. One possible explanation for variation among results is differences among researchers in the decisions they make regarding statistical analyses. A growing array of studies has explored this analytical variability in different (mostly social science) fields, and has found substantial variability among results, despite analysts having the same data and research question. We implemented an analogous study in ecology and evolutionary biology, fields in which there have been no empirical exploration of the variation in effect sizes or model predictions generated by the analytical decisions of different researchers. We used two unpublished datasets, one from evolutionary ecology (blue tit, Cyanistes caeruleus, to compare sibling number and nestling growth) and one from conservation ecology (Eucalyptus, to compare grass cover and tree seedling recruitment), and the project leaders recruited 174 analyst teams, comprising 246 analysts, to investigate the answers to prespecified research questions. Analyses conducted by these teams yielded 141 usable effects for the blue tit dataset, and 85 usable effects for the Eucalyptus dataset. We found substantial heterogeneity among results for both datasets, although the patterns of variation differed between them. For the blue tit analyses, the average effect was convincingly negative, with less growth for nestlings living with more siblings, but there was near continuous variation in effect size from large negative effects to effects near zero, and even effects crossing the traditional threshold of statistical significance in the opposite direction. In contrast, the average relationship between grass cover and Eucalyptus seedling number was only slightly negative and not convincingly different from zero, and most effects ranged from weakly negative to weakly positive, with about a third of effects crossing the traditional threshold of significance in one direction or the other. However, there were also several striking outliers in the Eucalyptus dataset, with effects far from zero. For both datasets, we found substantial variation in the variable selection and random effects structures among analyses, as well as in the ratings of the analytical methods by peer reviewers, but we found no strong relationship between any of these and deviation from the meta-analytic mean. In other words, analyses with results that were far from the mean were no more or less likely to have dissimilar variable sets, use random effects in their models, or receive poor peer reviews than those analyses that found results that were close to the mean. The existence of substantial variability among analysis outcomes raises important questions about how ecologists and evolutionary biologists should interpret published results, and how they should conduct analyses in the future.
KW - credibility revolution
KW - heterogeneity
KW - meta-analysis
KW - metascience
KW - replicability
KW - reproducibility
U2 - 10.32942/x2gg62
DO - 10.32942/x2gg62
M3 - Preprint
BT - Same data, different analysts
PB - EcoEvoRxiv
ER -