-
Muralidharan, Karthik, Abhijeet Singh, and Alejandro J. Ganimian. 2019. "Disrupting Education? Experimental Evidence on Technology-Aided Instruction in India." American Economic Review, 109 (4): 1426–60.
- The overall task is to replicate all exhibits from the paper and online appendix other than Figures E3 and E4.
- This is an AER paper with a reproducibility score of 4/10.
- The exercise will show you how failing to adhere to best reproducibility practices leads to an unnecessarily strenuous replication process.
- All exhibits are perfectly reproducible after making the right changes (with the one exception of Table A8).
- This is the original reproducibility package downloaded from the AEA Data and Code repository.
Edit the code so that it works on your computer. This includes:
- Setting your working directory and changing file paths
You may need to create folders (for exhibits) that don't exist.
- Installing packages
Look at this blogpost and this code template for examples of how to share exact version of community-contributed commands.
- Re-setting globals that control which parts of the script are run
Unlike the R code we saw yesterday, this code includes many bugs. Creating issues to identify bugs and share solutions will help you get through this part of the exercise faster.
- Common mistakes may include:
- misnaming a variable
- creating a variable that already exists
- misnaming a dataset
- incorrect merges
- failure to save an exhibit
- Many bugs can be pre-empted by using
assertandisidstatements, e.g., to assert what variable uniquely identifies an observation in a particular dataset (especially useful for merging). You are encouraged to use them as you try to run the code. - If you find a bug, remember to document it as an issue as it may occur again.
- It may be useful to consult the paper, the file
data/Readme.pdf, and to explore the datasets used in the script (e.g., using thefrecommand).
We recommend you work on the code in this order:
- Replicate Tables 1-9
- Replicate Tables A1-A5
- Replicate Tables A6-A7
This may be challenging, so create a GitHub issue and collaborate with others on it
- Replicate Table A8
- Replicate Tables A9-A10 and Figs 1-4
- Replicate Fig 5
- Replicate Fig 7, 6, and A1-A4
Exploring the distribution of the variable
ms_idmay help solving bugs here
- Replicate Fig A5
- Reproduce Fig E1
Once you are able to run the full code, check if the outputs match the paper and if outputs are stable.
- Run the full code once and commit any changes
- Re-run the whole codebase and look at the diff to identify any instabilities in the outputs
One exhibit does not replicate. Which exhibit is it? Document it in the issues
- Stabilize the code.
- Describe which statistical objects of the unstable exhibit, differ from the table in the paper.
- Provide three candidate explanations for this difference and rank them by degree of plausibility according to your opinion.
How easy is it to compile the paper itself once the code runs?
- Save the tables as
.texrather than.xls. - You want to produce Table A8 with a stricter confidence interval, 99% rather than 95%. Estimate how long it would take you to update this table in the paper.
How easy it is to understand what the code is doing? How easy is it to make changes to it?
- Understand Table 3 in the paper.
- You want to run two robustness checks:
- Re-run each specification (i.e., each column of Table 3) without controlling for
Baseline math scorefor math-related competencies and forBaseline Hindi scorefor language-related competencies. - Re-run each specification (i.e., each column of Table 3) when controlling for both
Baseline math scoreandBaseline Hindi scorein each specification.
- Re-run each specification (i.e., each column of Table 3) without controlling for
- Save the exhibits so that each exhibit name's suffix indicates the robustness check performed.
- Look at the
git diffto count how many lines of code you needed to change to achieve this fully. - Can you think of a way to re-structure the code so that such changes would be easier? (If you've already done this above, great!)
How can you build checks into the code to prevent changes from introducing errors?
- Identify three operations performed in the code that are "risky". Create issues pointing to the lines of code that perform these operations.
For example, operations that change the number of observations in the data, combine different datasets, aggregate data points, or that may be sensitive to the presence of missing values
- Discuss with your group how to build checks into the code to prevent errors from being introduced when these operations are performed.
Two particularly useful commands here are
assert,isid. You may also want to explore the help file formergeto look for options that test the results of this command.