Design and Analysis of Replication Studies

A workshop by the Center of Reproducibility Science (CRS) in Zurich.

January 22 - 24, 2020

Venue: University of Zurich, Epidemiology, Biostatistics and Prevention Institute (EBPI), Hirschengraben 82, room HIT H 03. You will find this building by walking around Hirschengraben 84. Contact us anytime by email crs@ebpi.uzh.ch

About the workshop

The goal of this international workshop is a thorough methodological discussion regarding the design and the analysis of replication studies including specialists from different fields such as clinical research, psychology, economics and others.

All interested researchers are invited to participate. Limited seating available, please register below.

Program Info

22 Jan, 2020

12 : 45 - 13 : 00

Welcome

By Leonhard Held, CRS Director

13 : 00 - 16 : 00

Tutorial on The R package ReplicationSuccess

By Charlotte Micheloud, Samuel Pawel, Leonhard Held

23 Jan, 2020

09 : 00 - 09 : 05

Welcome

By Leonhard Held, CRS Director

9 : 05 - 9 : 50

What should we "expect" from reproducibility?

By Stephen Senn, Consultant Statistician, Edinburgh

Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance. Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.

9 : 50 - 10 : 20

Replicability as generalizability: Revisiting external validity with specification curve analysis

By Johannes Ullrich, Department of Psychology, University of Zurich

Abstract coming soon

Break

10 : 20 - 10 : 50

Coffee break

10 : 50 - 11 : 35

Direct and conceptual replications

By Anna Dreber, Johan Björkman professor of economics, Stockholm School of Economics

Abstract coming soon

11 : 35 - 12 : 05

The role of replication studies in navigating epistemic landscapes

By Filip Melinščak, Postdoctoral Fellow, University of Zurich

Healthy scientific discourse involves scientists questioning one another’s findings. The question „Why should I believe you?“ is essentially a question about internal validity. As originally defined by psychologist Don Campbell, internal validity deals with the question, „did in fact the experimental stimulus make some significant difference in this specific instance?“ (Campbell, 1957). In recent years, the revival of good scientific practices such as replication and hypothesizing before the results are known have arguably improved internal validity. However, scientists should not only be wondering if a given effect exists at all, but also to what extent the effect can be generalized across populations, settings, and variables. In other words, they should be concerned about external validity as well. We reinterpret the old terms of internal and external validity by drawing on the notions of the ‚garden of forking paths’ and ‚researcher-degrees-of-freedom’, i.e., the fact that for one and the same dataset there often exist hundreds or thousands of different ways of testing the same hypothesis. We illustrate the use of specification curve analysis (Simonsohn, Simmons, & Nelson, 2015) for assessing internal and external validity in a common framework. Specification curve analysis in the service of validity testing involves estimating a test statistic (e.g., effect size) conditional on variations in model specification (e.g., analytic decisions, populations, settings, and variables). We conclude with a discussion of the value of specification curve analysis for the evolution of a line of research.

Break

12 : 05 - 13 : 30

Flying lunch

13 : 30 - 14 : 00

Experimental replications in animal trials

By Florian Frommlet, CEMSIIS, Section of Medical Statistics, Medical University Vienna

The recent discussion on reproducibility of scientific results is particularly relevant for preclinical research with animal models. Within that research community there exists some tradition to repeat an experiment three times to demonstrate replicability. However, there are hardly any guidelines about how to plan for such an experimental design and also how to report the results obtained. This article provides a thorough statistical analysis of the 'three-times' rule as it is currently often applied in practice and gives some recommendations how to improve on study design and statistical analysis of replicated animal experiments.

14 : 00 - 14 : 30

Identifying boundary conditions in confirmatory preclinical animal studies to increase value and foster translation

By Meggie Danziger, QUEST Center for Transforming Biomedical Research at the Berlin Institute of Health

The recent discussion on reproducibility of scientific results is particularly relevant for preclinical research with animal models. Within that research community there exists some tradition to repeat an experiment three times to demonstrate replicability. However, there are hardly any guidelines about how to plan for such an experimental design and also how to report the results obtained. This article provides a thorough statistical analysis of the 'three-times' rule as it is currently often applied in practice and gives some recommendations how to improve on study design and statistical analysis of replicated animal experiments.

Break

14 : 30 - 15 : 00

Coffee break

15 : 00 - 15 : 45

Evaluating statistical evidence in biomedical research, meta-studies, and radical randomization

By Don van Ravenzwaaij, Associate Professor, University of Groningen

For the endorsement of new medications, the US Food and Drug Administration requires replication of the main effect in randomized clinical trials. Typically, this replication comes down to observing two trials, each with a p-value below 0.05. In the first part of this talk, I discuss work from a simulation study (van Ravenzwaaij & Ioannidis, 2017) that shows what it means to have exactly two trials with a p-value below 0.05 in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below 0.05 have wildly differing Bayes factors. In a non-trivial number of cases, evidence actually points to the null hypothesis. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. In the second part of this talk, I will propose a different way to go about replication: the use of meta-studies with radical randomization (Baribault et al, 2018).

15 : 45 - 16 : 30

A new standard for the analysis and design of replication studies

By Leonhard Held, CRS Director

Abstract coming soon

Dinner

19 : 00 - 22 : 00

Conference dinner

Location to be announced

24 Jan, 2020

09 : 00 - 09 : 45

TBD

By E.J. Wagenmakers, CRS Director

Abstract coming soon

09 : 45 - 10 : 15

The sufficiently skeptical intrinsic prior

By Guido Consonni, Professore Ordinario, Università cattolica del sacro cuore

Abstract coming soon

Break

10 : 15 - 10 : 45

Coffee break

10 : 45 - 11 : 15

A novel approach to meta-analysis testing under heterogeneity

By Judith ter Schure, PhD student, Centrum Wiskunde & Informatica Amsterdam

Scientific knowledge accumulates and therefore always has a (partly) sequential nature. As a result, the exchangeability assumption in conventional meta-analysis cannot be met if the existence of a replication — or generally: later studies in a series — depends on earlier results. Such dependencies arise at the study level but also at the meta-analysis level, if new studies are informed by a systematic review of existing results in order to reduce research waste. Fortunately, studies series with such dependencies can be meta-analyzed with Safe Tests. These tests preserve type I error control, even if the analysis is updated after each new study. Moreover, they introduce a novel approach to handling heterogeneity; a bottleneck in sequential meta-analysis. This strength of Safe Tests for composite null hypotheses lies in controlling type I errors over the entire set of null distributions by specifying the test statistic for a worst-case prior on the null. If for each study such a (study-specific) test statistic is provided, the combined test controls type I error even if each study is generated by a different null distribution. These properties are optimized in so-called GROW Safe Tests. Hence, they optimize the ability to reject the null hypothesis and make intermediate decisions in a growing series, without the need to model heterogeneity.

11 : 15 - 12 : 00

Efficient designs under uncertainty: Guarantee compelling evidence with Sequential Bayes Factor designs

By Felix Schönbrodt, Principal Investigator, Ludwig-Maximilians-Universität München

Abstract coming soon

Lunch

12 : 00 - 13 : 30

Flying lunch

13 : 30 - 14 : 15

Shrinkage for reproducible research

By E.W. van Zwet, Associate Professor, Leiden University Medical Center

The pressure to publish or perish undoubtedly leads to the publication of much poor research. However, the fact that significant effects tend to be smaller and less significant upon attempts to reproduce them, is also due to selection bias. I will discuss this "winner's curse" in some detail and show that it is largest in low powered studies. To correct for it, it is necessary to apply some shrinkage. To determine the appropriate amount of shrinkage, I propose to embed the study of interest into a large area of research, and then to estimate the distribution of effect sizes across that area of research. Using this estimated distribution as a prior, Bayes' rule provides the amount of shrinkage that is well-calibrated to the chosen area of research. I demonstrate the approach with data from the OSC project on reproducibility in psychology, and with data from 100 phase 3 clinical trials.

13 : 30 - 14 : 15

TBD

By Robert Matthews, Professor, Aston University Birmingham

Abstract coming soon

Wrap-up

15 : 00 - 16 : 00

Final discussion with coffee and tea

Registration is open

Please register until December 15, 2019

Register

Limited seating available.

Practical Information

Suggestions for hotels

logo

Hotel St. Josef
Hischengraben 64

www.st-josef.ch

logo

Hotel Marta
Zähringerstrasse 36

www.hotelmarta.ch

logo

Hotel Kafischnaps
Kornhausstrasse 57

www.kafischnaps.ch

logo

Hotel Bristol
Stampfenbachstrasse 34

www.hotelbristol.ch

Contact information

logo

UZH CRS
EBPI, Hischengraben 84

Archive earlier events

Please find here information about previous events of the CRS