Synthetic RCT

Improving Sampling and Generalizability in Field Experiments using Targeted Multi-Mode Convenience Samples and MRP
Randomized control trials have the potential to significantly advance policy goals and our understanding of human behavior. Our contribution to this enterprise is to suggest strategies for improving the generalizability of these experimental trials. The approach we propose integrates six crucial elements: 1) a digital stratification frame of Ghana; 2) a geo-coded corpus of human survey respondents; 3) an initial non-representative online RCT conducted in Ghana; 4) Multi-level Regression Post-stratification (MrP); 5) RCT with ChatGPT synthetic subjects; 6) an active learning algorithm that guides supplemental data collection. The method allows researchers to target specific geolocations where supplemental data collection will significantly improve their ability to generalize the results of estimated experimental treatment effects. This is a work in progress and we expect to implements to the method as we complete a series of RCTs over the course of the year.

Artificially Intelligent RCT Pilot: Afro-Barometer and Candour II
This essay reports the results of preliminary efforts at understanding the performance of LLM models in predicting the behavior of human subjects in experimental settings. In particular, we focus on samples of humans who are likely to be under-represented in the training corpara and experiments that are extremely time-consuming, expensive, and have large policy implications for social welfare; vaccine uptake in the global south. On balance, the results are encouraging but there is room for considerable progress. In the case of the Afro-Barometer pilot study, the distribution of responses were similar for both LLMs and humans. But the LLM was unable to replicate the RCT treatment effects that we observed with humans in rural Ghana. The LLM model performed well in predicting the vaccination uptake for the persona created from subjects in the 16-country CANDOUR survey. Surprisingly, given the concerns about bias in LLM models, the LLM did poorly in predicting vaccination update by the American persona.