Real signals or artificial stereotypes?

A concerning look at how LLMs can project their own internal biases onto datasets when prompted to perform analyses. This is a problem with using plausible responses as a proxy for actual responses

First, I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the exact same 2000 responses but labelled these ‘US’. Finally, I combined them to create a dataset of 4000 total responses, and jumbled them up.

Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed.