In person: Nuffield College, Butler Room – 14:00
Online: Zoom link

Aaron Kaufman is Associate Professor of Political Science at NYU Abu Dhabi, and Co-Director of the Center for Interdisciplinary Data Science and Artificial Intelligence. His work applies computational tools to measurement problems in political science, including ideology, discrimination, policy significance, and legislative district compactness. His work has appeared in Nature, Nature Scientific Data, Nature Scientific Reports, the APSR, AJPS, BJPS, JOP, Political Analysis, the British Medical Journal, and the Journal of Quantitative Analysis in Sports. He received his PhD in Political Science and AM in Statistics from Harvard, and his BA in Political Science from the University of California, Berkeley.

Measuring the Political Biases of Large Language Models

Large Language Models (LLMs) are a transformational technology, fundamentally changing how people obtain information and interact with the world. As people become increasingly reliant on them for an enormous variety of tasks, a body of academic research has developed to examine these models for inherent biases, especially political biases, often finding them small. We challenge this prevailing wisdom. First, by comparing 31 LLMs to legislators, judges, and a nationally representative sample of U.S. voters, we show that LLMs’ apparently small overall partisan preference is the net result of offsetting extreme views on specific topics, much like moderate voters. Second, in a randomized experiment, we show that LLMs can promulgate their preferences into political persuasiveness even in information-seeking contexts: voters randomized to discuss political issues with an LLM chatbot are as much as 5 percentage points more likely to express the same preferences as that chatbot.


In person: Nuffield College, Butler Room – 14:00
Online: Zoom link

Carlos Scartascini is Principal Technical Leader at the Research Department of the Inter-American Development Bank and Leader of the Research Department Behavioral Economics Group. He has published eight books and about 90 articles in academic journals and edited volumes. He is a member of the Executive Committee of IDB’s GDLab, member of the Scientific Committee of  Elcano Royal Institute, member of the Board of Advisors of the Master of Behavioral and Decision Sciences at the University of Pennsylvania, Associate Editor of the academic journal Economía, and Founding Member of LACEA’s BRAIN (Behavioral Insights Network).

Corruption and Political Accountability in Good and Bad Economic Times

While the literature extensively explores the structural enablers of corruption and its adverse effects on economic performance, less is known about how the state of the economy influences corruption and political accountability. To address this gap, we develop a theoretical model in which politicians may divert resources from public goods, and citizens can respond by punishing corruption. In our model, positive economic booms increase corruption while weakening accountability. We validate these predictions through a laboratory experiment, finding that corruption rates significantly rise when economic conditions are good. However, citizens’ willingness to punish corrupt politicians remains stable across the business cycle. Punishment decisions are driven by observed public good allocations; low allocations prompt significantly higher punishment rates than high allocations, even resulting in the punishment of honest politicians during bad economic times. Additionally, we assess the role of corruption expectations in shaping responses: citizens with prior beliefs that politicians are corrupt are less likely to punish than those who believe politicians are honest when public good provision is low. Accountability becomes more challenging when citizens struggle to clearly identify corruption, and citizens are more forgiving of corruption during good economic times, especially if they already mistrust politicians. These findings highlight the importance of strong transparency and accountability mechanisms to uphold governance standards, particularly in the face of economic fluctuations and public mistrust.


10.00 – 11.00
Online: Zoom link

Professor Pang’s research spans global risk politics, the geopolitics of critical raw materials, and the application of LLMs to social science. She is the author of From Cold Politics to Hot Politics (Peking University Press, 2026) and a forthcoming textbook on Large Language Models and Social Science Research. She has published in Political Analysis, International Organization, and Political Science Research & Methods, among others.

Yang Wu’s research focuses on large language models, LLM reasoning, social simulation, and computational sociology.

How Can Synthetic Experiments Deliver Credible Causal Inference in Social Science?

The rapid diffusion of large language models (LLMs) has spurred growing interest in ‘synthetic experiments’, in which LLMs or LLM-driven agents simulate human subjects for causal inference. While such approaches promise scalability, cost efficiency, and experimental flexibility, fundamental methodological challenges — both theoretical and technical — must be addressed before they can serve as a credible causal engine.

Drawing on a pilot study that synthetically replicates the ‘hawkish bias’ experiment in foreign policy decision-making, this talk identifies key obstacles to credible causal inference in synthetic settings — including persona drift, ambiguous treatment assignment, underdeveloped benchmarking, and an unsettled research design — and discusses potential solutions from a learning perspective.

This session will cover:

  • The rise of synthetic experiments using LLMs for causal inference in social science.
  • Key methodological challenges: persona drift, treatment assignment, benchmarking, and sampling.
  • How counterfactual data in fine-tuning promotes causal rather than correlational learning.
  • Reasoning-oriented training for capturing intermediate causal mechanisms.
  • Practical implications for designing credible AI-driven social science experiments.

14.00 – 15.00, hybrid format
Butler Room, Nuffield College & Zoom

Dr. Brian Scholl is a leading economist at the intersection of regulation, financial markets, and artificial intelligence. He currently serves as a Staff Regulatory Researcher at Norm Ai, where he leads work on quantifying risk and designing data-driven, AI-enabled compliance programs for financial institutions.

Previously, Dr. Scholl was the founding Chief Economist of the U.S. Securities and Exchange Commission Office of Investor Research, where he led cutting-edge research on investor behavior and emerging technologies, including AI. He also served as Chief Economist of the U.S. Senate Budget Committee, advising policymakers on economic evidence and fiscal policy.

Recognized as the U.S. government’s top evidence innovator in 2022, Dr. Scholl pioneered rapid-cycle evidence generation systems that dramatically shortened research timelines and translated economic insights into real-world policy impact. Across roles, he has been known for building bridges between regulators, markets, and technology—turning complex economic and behavioral research into actionable frameworks.

Title: Experimental Evidence on Decision-Making: Implications for Governance in an AI-Enabled Future

Abstract
Rapid advances in artificial intelligence have renewed longstanding questions about human decision-making, institutional design, and governance. Is this technological moment fundamentally different? How do new systems interact with human cognitive limitations? And under what conditions do they enhance—or undermine—trust, legitimacy, and social welfare?

This talk draws on a series of large-scale behavioural experiments I conducted in the context of financial regulation to examine how individuals make consequential decisions in environments characterized by complexity, asymmetric information, and institutional mediation. Retail investors routinely face choices—whether to participate in markets, how to allocate assets, how to interpret disclosures, and whether to rely on advice—that exceed the capacities of unaided human cognition. Traditional regulatory tools, particularly disclosure, have struggled to address these challenges and, in important respects, have fallen short of their intended protective role.

I review evidence from multiple randomized experiments that investigate how features of choice architecture shape behaviour, including linguistic complexity and jargon, reference points and performance benchmarks, and reliance on expert advice. Across settings, the findings reveal both the promise and the fragility of behavioural interventions: modest changes in presentation can meaningfully improve decisions, yet the same mechanisms can also be exploited by intermediaries facing competing incentives. In one set of studies, simplifying language and carefully structuring choice environments substantially improves comprehension and decision quality; in others, benchmark performance framing strongly influences investment choices even when underlying fundamentals are unchanged. A further experiment shows that individuals exhibit limited ability to screen the quality of financial advice, accepting poor guidance nearly as often as good guidance.

Taken together, these results highlight persistent limits to individual self-protection in complex systems and raise broader questions for the governance of AI-enabled decision support. While AI systems hold the potential to augment human capital, triage information, and personalize guidance at scale—benefiting individuals, institutions, and regulators alike—they also risk amplifying manipulation, opacity, and power asymmetries if left unchecked.

The talk concludes by using these experimental findings as a lens to prompt discussion about the design and regulation of AI-enabled institutions. Rather than treating AI as a replacement for human judgment, the discussion emphasizes how evidence on human behaviour can inform regulatory and institutional frameworks that are more trustworthy, legitimate, and aligned with long-run systemic stability—while acknowledging the risks such systems pose and the possibility that fundamentally new governance approaches may be required.


15.00 – 16.00, hybrid format
Lecture Room, Nuffield College & Zoom

Mohsen is an Associate Professor at Oxford Internet Institute, a Governing Body Fellow at Wolfson College, and a research affiliate at MIT Sloan School of Management. His lies at the intersection of computational (data science) and cognitive psychology. He studies how information spreads on social media and how ties are formed on social networks.

Title: Echo Platforms & Conversational Corrections

Abstract

In this talk, I will talk about two recent projects.

First, I will talk about a cross‑platform study where we collected 10 million news‑link posts across seven social media platforms and examined engagement by outlets’ political slant and quality. We find that platforms with more conservative user bases share lower‑quality news on average; the partisan engagement advantage flips by platform (aligned content wins); and within users, lower‑quality links get higher average engagement — even on non‑ranked feeds — implicating user preferences rather than algorithms.

Next, I will talk about an experimental project that combines lab and field to debunk false claims using LLMs. We tune conversational corrections in a simulated feed, then deploy the best strategies on Bluesky and X to test external validity with native engagement metrics.


Wednesday, 25th June
Carlos III University – Madrid, Spain

Livia Schubiger (D-GESS, ETH Zurich) & Raymond Duch (University of Oxford) organized an EPSA Madrid 2025 pre-conference that assembled researchers working with Large Language Models (LLMs). The focus were on the application of LLMs to research being conducted in the social sciences.


From cutting-edge vaccine trials to behavioural insights for public health and global policy debates — featuring leading researchers from the Department of Economics, University of Oxford, Blavatnik School of Government, University of Oxford, King’s College London, and Gavi, the Vaccine Alliance.


The IMEBESS 2025 Synthetic Replication Games was a one-day in-person workshop organised by the Talking to Machines team in the context of a large replication project exploring the potential of large language models (LLMs) to augment human samples in experimental social science. This workshop invited researchers of all backgrounds and career stages to collaborate in adapting experimental studies from top-tier journals into a standardised format and replicate them using a variety of LLMs. The results were presented in a dedicated round table at IMEBESS 2025, and contributors were recognised as co-authors on the final publication.

The workshop was held on 21th May 2025 in the IMEBESS conference venue.


  • Date: Monday, 25th November 2024
  • Time: 9.30am – 5pm
  • Location: Nuffield College, University of Oxford, Oxford (UK)

The post-election workshop was an opportunity to critically assess the various data collection and modeling strategies employed for pre-election estimates of the vote share of the major U.S. presidential candidates over the course of the campaign. Leading pollsters from academia and industry critically assessed the state of pre-election polling in the aftermath of the 2024 Presidential Elections.

9:00am – 9:30amCoffee
9:30am – 10:00amWelcome and Introductions
10:00am – 11:00amNational Election Studies & Commercial Polling
– Shanto Iyengar, Professor (Stanford) and Principal Investigator of the American National Election Study (ANES)
– Jane Green, Professor (Nuffield College), President of the British Polling Council, and Principal Investigator British Election Study
– Clifford Young, President Public Affairs (IPSOS U.S.A.)
11:00am – 11:30amCoffee Break
11:30am – 12:30pmAdvances in Model-Based Election Election-Polling
– Lucas Leemann, Professor (University of Zurich)
– Kosuke Imai, Professor (Harvard University)
12:30pm – 1:30pmLunch
1:30pm – 2:30pmPolling Aggregation: Resilience to Systematic Bias
– Eli McKown-Dawson, LSE and Silver Bulletin
– Steve Fisher, Professor (University of Oxford)
– Martin Strabe and Oliver Hawkins (The Financial Times)
2:30pm – 2:45pmCoffee
2:45pm – 3:45pmElection Forecasting: Mapping the State of the Art
– Mary Steigmeier, Professor (University of Missouri)
– Philippe Mongrain, Post-doctoral Researcher (University of
Antwerp)
3:45pm – 4:00pmCoffee
4:00pm – 5:00pmSilicon Sampling for Public Opinion Polling
– Roberto Cerina, Assistant Professor (University of Amsterdam)
– Ray Duch, Professor (Nuffield College)
– Ben Warner, Founder (Electric Twin)