Can randomised control trials deliver a more equal world, one coin toss at a time?

Federal Shadow Assistant Treasurer Andrew Leigh, a former professor of economics at the Australian National University, has just published a new book: Randomistas: How radical researchers changed our world.

He argues that across medicine, business and government, there’s no simpler or more powerful tool for finding out what works than a randomised experiment. Yet, he says, when it comes to social policy, "the vast majority of programs designed to help the most vulnerable are grounded more in greybeard beliefs than empirical evidence".

Leigh recently outlined the book's argument in a speech to the Presbyterian Support Northern Seminar Series on Child Wellbeing in Wellington, New Zealand. It's a long-read, packed with Australian and international examples, and which also responds to concerns that randomised trials are unfair. He said:

"Conducting more randomised evaluations isn’t an excuse to give up on the problem. We don’t abandon the search for a cure for cancer just because most cancer drugs to emerge from the laboratory don’t make it through clinical trials."

***

I acknowledge the Māori people, the traditional owners of the lands on which we meet, and thank our hosts, Presbyterian Support Northern, for inviting me to deliver these lectures.

Seeing New Zealand now has also turned out to be a pretty good predictor of what’s likely to happen next in Australia.

New Zealand women won the right to vote nine years earlier than Australian women.

Your country enacted same sex marriage four years before we did.

You even gave Barnaby Joyce citizenship before we did.

So to be in New Zealand isn’t just a chance to see the sun rise a couple of hours earlier – it’s also an opportunity to get a sneak peak into some of the things that might shape Australia’s future.

And I have to say that as a member of the Labor opposition in Australia, I’m keenly hoping that this year or next will see Australia’s voters follow your lead in electing a progressive government.

Yesterday, I had the pleasure of speaking in Auckland on the topic of what Australia and New Zealand can learn from one another about reducing inequality.

Today, I want to focus on a specific policy area – randomised policy trials – and discuss how they might help us narrow the gap between rich and poor.

My talk today will draw upon a new book, Randomistas: How Radical Researchers Changed Our World, published in Australia last month by Black Inc and in the United States in August by Yale University Press.

Now, you might think some things are so obvious we don’t need randomised trials to prove them.

To discourage early pregnancy, ask teenage girls to care for a baby doll that’s programmed to demand attention at all hours.
Juvenile delinquents can be ‘scared straight’ by spending a day in jail, and seeing how tough prison really is.
What young unemployed men most need is job training.

Each of these statements sounds completely reasonable, doesn’t it? Unfortunately, all three claims are wrong. In randomised trials, girls who cared for an infant simulator for a week were twice as likely to become teenage mothers. Scared Straight programs increased crime. And many job training programs for unemployed youths have produced disappointing results when rigorously evaluated.

When it comes to tackling disadvantage, randomised trials don’t just spotlight failure, they can also shine a light down new paths for addressing poverty.

* * *

And their parents had hope

In 1958, psychologist David Weikart took up the job of being director of special education in Ypsilanti, Michigan. At that time, schools were segregated, and all the African-American students in the town attended one primary school – the Perry School. Weikart noticed that the school was run down. Instead of a playground, it had a field filled with thistles. Many of the African-American students ended up repeating grades, entering special education or leaving school early.

Yet when Weikart gave a presentation to school principals about these problems, users responded defensively. One sat with arms tightly folded; others stood by the window smoking; a few left the room. When he pressed them to act, they said there was nothing they could do. Black students were just born that way. So Weikart came up with an alternative solution: ‘Because I couldn’t change the schools . . . well, obviously you do it before school.’

In the late 1950s the only institutions that looked anything like preschools were nursery schools, focused purely on play. By contrast, Weikart was interested in the work of psychologists such as Jean Piaget, which suggested that young children’s minds are actively developing from the moment they are born. But when it came to early intervention, Weikart noted, ‘There was no evidence that it would be helpful. There wasn’t data.’ So he decided to put Piaget’s theories to their first rigorous test.

In 1962 the Perry Preschool opened, for children aged three and four. About 100 children applied to enrol. Half were admitted, while half remained as a control group. The selection was random – literally made by the toss of a coin.

Former Perry Preschool teacher Evelyn Moore remembers how the program pushed back against the prevailing wisdom that a child’s intelligence was fixed, and that many of the children in the community were ‘retarded’. She saw something different – these children knew the names of baseball players. They recalled the words to songs. And their parents had hope. When Moore visited the families at home, she saw that almost all had pictures on the wall of two men – John F. Kennedy and Martin Luther King.

The preschool curriculum was highly verbal. Children visited a farm, a fire station and an apple orchard, where they picked apples and cooked them into apple sauce. Months later, in winter, they went back to the orchard to see the seasonal change. When Evelyn Moore asked the children where the apples had gone, one child reflexively replied, ‘Teacher, I didn’t take ’em.’

The Perry Preschool program lasted only two years, but over the coming decades researchers tracked the outcomes for those who had participated, and for the randomly selected control group. By the time they were in their twenties, those who had been to preschool were more likely to own a car, own a home and have a steady job. They were also less likely to use drugs and less likely to be on welfare. By age forty, a quarter per cent of those in the preschool group had been to jail, compared with half of the control group.

The leading economic analysis of the program estimates that for every $1 spent on Perry Preschool, the community gained between $7 and $12. By far the biggest benefit came from reduced crime, showing that if you target early intervention at people with a fifty-fifty chance of going to prison, you can change the lives of participants at a reasonable cost to the broader community.

Great schools can transform lives

But while randomised evaluations have underpinned significant intervention in early years programs, they have also shown that it’s not ‘game over’ after the first 1000 days of a child’s life. Schools matter – indeed, great schools can transform lives.

One randomised evaluation looked at schooling in New York’s Harlem district. Outcomes for young people in Harlem were dreadful: a study once found that life expectancy for young men born in Harlem was lower than for those born in Bangladesh. Cocaine, guns, unemployment and family breakdown created an environment where disadvantage was perpetuated from one generation to the next.

Founded in 2004, Harlem’s Promise Academy is no ordinary school. It has an extended school day, with classes running starting at 8 am, and after-school activities often continuing until 7 pm. There are remedial classes on Saturdays, and the summer break is shorter than in most schools. The school operates on a ‘no excuses’ model, emphasising grit and perseverance. It is assumed that every child will go on to university. Both students and teachers are heavily monitored, with a strong focus on test score gains. With up to twenty applicants per place, the Promise Academy uses lotteries to allocate spots, an approach that allows researchers to compare outcomes across the two groups.

What difference did they find? One way to benchmark the impact is to note that the average black high school student in the United States is two to four years behind his or her white counterparts. Yet the mostly black students who won a lottery to attend the Promise Academy improved their performance by enough to close the black–white test score gap. As lead researcher Roland Fryer points out, this overturns the fatalistic view that poverty is entrenched, and schools are incapable of making a transformational difference. He claims that the achievements of the Harlem Children’s Zone are ‘the equivalent of curing cancer for these kids’

The randomistas are also endeavouring to improve teaching. For example, the Bill and Melinda Gates Foundation recently conducted a randomised trial of coaching programs for teachers. Each month, teachers sent videos of their lessons to an expert coach, who worked with them to eliminate bad habits and try new techniques. By the end of the year, teachers in the coaching program had seen gains in their classroom equivalent to several additional months of learning.

The British Education Endowment Foundation has so far commissioned over a hundred evaluations, many of them randomised, to test what works in the classroom. Among those randomised evaluations that produced positive results are personal academic coaching, individual reading assistance, a Singaporean-designed mathematics teaching program, and a philosophy-based intervention encouraging students to become more engaged in classroom discussion.

With so many evaluations, they can readily compare the size of the results. To get a one-month improvement for one student, personal academic coaching cost £280, individual reading assistance cost £209, the mathematics teaching program cost £60, and the philosophy-based intervention cost £8.38 So while all the programs ‘worked’, some were a whopping thirty-five times more cost-effective than others.

In some cases, the Education Endowment Foundation trialled programs that sounded promising, but failed to deliver. The Chatterbooks program was created for children who were falling behind in English. Hosted by libraries on a Saturday morning and led by trained reading instructors the program gave primary school students a chance to read and discuss a new children’s book. Chatterbooks is the kind of program that warms the cockles of your heart. Alas, a randomised trial found that it produced zero improvement in reading abilities.

Another Education Endowment Foundation trial tested the claim that learning music makes you smarter. Students were randomly assigned either to music or drama classes, and then tested for literacy and numeracy. The researchers found no difference between the two groups; suggesting either that learning music isn’t as good for your brain as we’d thought, or that drama lessons are equally beneficial.

In a similar vein, a recent randomised trial of free school breakfast programs in New Zealand schools found that it reduced hunger rates (by 8.6 units on the ‘Freddy satiety scale’, in case you’re curious).[1] However, free breakfasts did not improve school attendance or academic achievement for low-income children.

Educational randomistas are even evaluating how to get more low-income children to university.

In Ohio and North Carolina, researchers worked with tax preparation company H&R Block to identify low-income families with a child just about to finish high school. Half of these families were randomly offered assistance in completing a university financial aid application, a process that took about eight minutes. Two years later, the children of those who had received help applying for financial aid were one-quarter more likely to be enrolled at university.

Because children whose parents did not attend university often lack basic information about the college application process, modest interventions can have large impacts. In Ontario, a three-hour workshop for Year 12 students raised college attendance rates by one-fifth, relative to a randomised control group. In regional Massachusetts, peer support provided by text message raised the odds that Year 12 students would enrol in college.

* * *

For the most affluent, it doesn’t matter much whether government works. They can rely on private healthcare, private education, and private security. They are less likely to be unemployed, and have family resources to draw upon in hard times. For the top 1 percent, dysfunctional government is annoying, but not life-threatening.

But for the most vulnerable, government can mean the difference between getting a good education or struggling through life unable to read and write. Those who depend on government depend on knowing that the programs government is delivering actually work.

Better to select on need, not chance?

In Melbourne, the Sacred Heart Mission has been working closely with long-term homeless people since 1982. A few years ago, the organisation proposed to trial a new intensive casework program, targeted at people who had been sleeping rough for at least a year. When they pitched the idea to their philanthropic partners, one donor urged that it be evaluated through a randomised trial.

Guy Johnson, who worked in community housing and would eventually help conduct the research, was pretty sceptical at first. People in the community sector, he told me, ‘freak out at the word experimental’, and prefer to select participants based on need, not chance. But Johnson came to regard randomisation not only as the most rigorous method for evaluating the program, but also the fairest way of deciding who got the service.

The ‘Journey to Social Inclusion’ experiment was Australia’s first randomised trial of a homelessness program. For the forty or so people in the treatment group, it provided intensive support from a social worker, who was responsible for only four clients. This caseworker might help them find housing, improve their health, reconnect with family and access job training. Another forty people in the control group did not receive any extra support.

What might we expect from the program? If you’re like me, you’d have hoped that three years of intensive support would see all participants healthy, clean and employed. But by and large, that’s not what the program found. Those who were randomly selected into the program were indeed more likely to have housing, and less likely to be in physical pain. But Journey to Social Inclusion had no impact on reducing drug use or improving mental health. In fact, those who received intensive support were more likely to be charged with a crime. At the end of three years, just two people in the treatment group had a job – the same number as in the control group.

While it’s disappointing that the program didn’t bring most participants back into mainstream society, it’s less surprising once you begin to learn about the people it seeks to assist. In many cases, they were abused in childhood (the mother of one participant used to put Valium in the child’s breakfast cereal). Most had used drugs for decades, and they were used to sleeping rough. Few had completed school or possessed the skills to hold down a regular job. If they had children of their own, more often than not they had been taken away by child protection services.

The Journey to Social Inclusion program is a reminder of how hard it is to turn around the living standards of the most disadvantaged. If you’ve been doing drugs for decades, your best hope is probably a stable methadone program. If you’re in your late forties with no qualifications and no job history, a stable volunteering position is a more realistic prospect than a steady paycheck.

Unless we properly evaluate programs designed to help the long-term homeless, there’s a risk that people of goodwill – social workers, public servants and philanthropists – will fall into the trap of thinking it’s easy to change lives. There are plenty of evaluations of Australian homelessness programs that have produced better results than this one. But because none of those evaluations was as rigorously conducted as this one, there’s a good chance they’re overstating their achievements.

Be sceptical of those 'peddling panaceas'

Blockbuster movies are filled with white knights and magic bullets, moon shots and miracles. Yet in reality most positive change doesn’t happen suddenly. From social reforms to economic change, our best systems have evolved gradually. Randomised trials put science, business and government on a steady path to improvement. Like a healthy diet, the approach succeeds little by little, through a series of good choices. The incremental approach won’t remake the world overnight, but it will over a generation.

Randomised trials flourish where modesty meets numeracy. As British randomista David Halpern puts it: ‘We need to turn public policy from an art to a science.’ This means paying more attention to measurement, and admitting that our intuition might be wrong. One of the big thinkers of US social policy, Senator Daniel Patrick Moynihan, recognised that evaluations can often produce results which are solid rather than stunning. When faced with a proposed new program, Moynihan was fond of quoting Rossi’s Law (named after sociologist Peter Rossi), which states: ‘The better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.’ Rossi’s Law does not mean we should give up hope of changing the world for the better. But we ought to be sceptical of anyone peddling panaceas. The belief that some social programs are flawed should lead to more rigorous evaluation and patient sifting through the evidence until we find a program that works.

The best randomistas are passionate about solving a social problem, yet sceptical about the ability of any particular program to achieve its goals. Launching an evaluation of her organisation’s flagship program, Read India, Rukmini Banerji, told the audience: ‘And of course [the researchers] may find that it doesn’t work. But if it doesn’t work, we need to know that. We owe it to ourselves and the communities we work with not to waste their and our time and resources on a program that does not help children learn. If we find that this program isn’t working, we will go and develop something that will.’

* * *

Randomised trials don’t have to be expensive or time-consuming.

One firm in the United States offered employees up to $750 if they could quit smoking for a year. Those randomly chosen for the program were 10 percentage points more likely to quit. It turned out that an effect this large means that it would be worth firms with plenty of smokers offering the program even if they did not care about the health of their employees. That’s because smokers take more breaks during the day, and more days off during the year.

Another simply randomised trial was conducted by the German government in 2010. They posted out a cheerful blue brochure to over 10,000 people who had recently lost their jobs.33 ‘Bleiben Sie aktiv!’ (‘Stay active!’), the leaflet urged unemployed people. The leaflet boosted employment rates among those who received it. Each leaflet cost less than €1 to print and post, but boosted earnings among the target group by an average of €450. If you know another government intervention with a payoff ratio of 450 to 1, I want to hear about it.

In 2013 the Obama White House, working with a number of major foundations, announced a competition for low-cost randomised trials. The aim was to show that it was possible to evaluate social programs without spending millions of dollars. From over fifty entries, the three winners included a federal government department planning to carry out unexpected workplace health and safety inspections, and a Boston non-profit providing intensive counselling to low-income youth hoping to be the first in their family to graduate from college. Each evaluation cost less than $200,000. The competition continues to operate through the Laura and John Arnold Foundation, which has announced that it will fund all proposals that receive a high rating from its review panel.

* * *

Why don’t politicians commission more randomised trials?

When parliamentarians are probed on their misgivings, the chief concern is fairness. Half of Australian politicians and one-third of British politicians worry that randomised trials are unfair. As medical writer Ben Goldacre points out: ‘We need to get better at helping them to learn more about how randomised controlled trials work . . . Many members of parliament say they’re worried that randomised controlled trials are “unfair”, because people are chosen at random to receive a new policy intervention: but this is exactly what already happens with “pilot studies”, which have the added disadvantage of failing to produce good quality evidence on what works, and what does harm.’

Rejecting randomised trials on the grounds of unfairness also seems at odds with the fact that lotteries have been used in advanced countries to allocate school places, housing vouchers and health insurance, to determine ballot order, and to decide who gets conscripted to fight in war.

One way of thinking about the ethical issue in randomisation is that it turns on what we know about a program’s effectiveness. Adam Gamoran, a sociologist at the University of Wisconsin– Madison, agrees that if you are confident that a program works, then it is unethical to conduct a randomised trial. But if you are ignorant about whether the program works, and a randomised trial is feasible, Gamoran argues that it is unethical not to conduct one.

Just as modesty is a great ally of randomised trials, overconfidence can be their enemy. The more certain experts are of their skill and judgement, the less likely they are to use data. And yet we know from a range of studies that overconfidence is a common trait. Eighty-four per cent of Frenchmen think that they are above-average lovers. Ninety-three per cent of Americans think they are better-than-average drivers. Ninety-seven per cent of Australians rate their own beauty as average or better than average. In human evolution, overconfidence has proven to be a successful strategy. In our own lives, excess confidence can provide a sense of resilience – allowing us to take credit for successes while avoiding blame for failures.

The problem is that we live in a world in which failure is surprisingly common. In medicine, only one in ten drugs that looks promising in lab tests ends up getting approval. In education, only one-tenth of the randomised trials commissioned by the US What Works Clearinghouse produced positive effects. In business, just one-fifth of Google’s randomised experiments helped them improve the product. Rigorous social policy experiments find that only a quarter of programs have a strong positive effect. Once you raise the evidence bar, a consistent finding emerges: most ideas that sound good don’t actually work in practice.[2]

How do we institutionalise randomised trials?

In 2010 the British government became the first to establish a so-called ‘Nudge Unit’, to bring the principles of psychology and behavioural economics into policymaking. The interventions were mostly low-cost – such as tweaking existing mailings – and were tested through randomised trials wherever possible. In some cases they took only a few weeks. Since its creation, the tiny Nudge Unit has carried out more randomised experiments than the British government had conducted in that country’s history. Following the British model, Nudge Units have been established by governments in Australia, Germany, Israel, the Netherlands, Singapore and the United States, and are being actively considered in Canada, Finland, France, Italy, Portugal and the United Arab Emirates.

In federal systems, another practical way that governments have encouraged randomised trials is by the national government building randomised trials into state grants programs. For example, the US Second Chance Act, dealing with strategies to facilitate prisoner re-entry into the community, sets aside 2 per cent of program funds for evaluations that ‘include, to the maximum extent possible, random assignment . . . and generate evidence on which re-entry approaches and strategies are most effective’. In a unitary system like New Zealand’s, a similar approach could be taken where grants are being distributed to local government or to non-government organisations.

* * *

The exercise is called ‘The Fist’. The young men are split into pairs. One is given a golf ball. The other is told he has thirty seconds to get the ball.

Immediately, students start grabbing, hitting and wrestling.

After the time is up, the teacher asks why no one simply asked for the ball. ‘He wouldn’t have given it,’ says one. ‘He would have thought I was a punk,’ replies another.

Then the teacher turns to those with the ball, and asks how they would have responded to a polite request. ‘I would have given it; it’s just a stupid ball,’ one replies.

The young men – from rough inner-city neighbourhoods – are participating in a crime prevention program called ‘Becoming a Man’. The goal is to shift teenagers from acting automatically to thinking deliberately, recognising that the right strategy on the street might be the wrong approach in the classroom. For example, a young man in a high-crime neighbourhood who complies with requests like ‘give me your phone’ may be seen as a soft target for future crimes. By contrast, if the same young man fails to comply with a request by his teacher to sit down in class, he may be suspended from school.

‘Becoming a Man’ doesn’t tell youths never to fight. Unlike children in affluent suburbs, teenagers growing up in high-poverty neighbourhoods may need to act tough just to stay safe. So the program’s role-play exercises encourage teenagers to choose the right response for the situation. Making eye contact could be fatal when walking past a rival gang member, but is essential in a job interview. Based on cognitive behavioural therapy, ‘Becoming a Man’ aims to get youths to slow down, judge the situation and deliberately choose whether to comply, argue or fight back.

Does it work? To find out, researchers in Chicago carried out two randomised trials, in which teenagers were randomly assigned into ‘Becoming a Man’ programs or after-school sports. ‘Becoming a Man’ cut arrests by a large amount: between one-third and one-half. Some researchers now think that reducing ‘automaticity’ – the tendency of young men to instinctively lash out – may do more to improve the lives of young men than standard academic remediation and job training programs. As one participant put it: ‘A boy has problems. A man finds solutions to his problems.’

Thanks to the randomistas, it looks like programs based around cognitive behavioural therapy are a valuable tool for communities seeking to address gang violence.

Ask the HiPPO?

Over the course of the twentieth century, randomised trials have turned health care into a profession that relied on ‘eminence-based medicine’ to one grounded in ‘evidence-based medicine’. Companies like Netflix, Coles, United Airlines, Amazon and Google have built randomised trials into their business model. Intuit founder Scott Cook aims to create a company that’s ‘buzzing with experiments’. Whatever happens, Cook tells his staff, ‘you’re doing right because you’ve created evidence, which is better than anyone’s intuition’. If you used the internet today, it’s likely you were part of a randomised trial.

Yet when it comes to social policy, the vast majority of programs designed to help the most vulnerable are grounded more in greybeard beliefs than empirical evidence. The alternative to rigorous evaluation is often to ask the HiPPO – the highest paid person’s opinion.

With inequality in many advanced countries at a post-war high, it’s time we raised the evidence bar. At a time when government budgets are under pressure, there’s no excuse for continuing to fund programs that don’t work.

Conducting more randomised evaluations isn’t an excuse to give up on the problem. We don’t abandon the search for a cure for cancer just because most cancer drugs to emerge from the laboratory don’t make it through clinical trials. Similarly, the goals of cutting crime, raising test scores, or achieving full employment should be pursued even if a specific program comes up short.

The more we ask the question ‘What’s your evidence?’, the more likely we are to find out what works – and what does not. By evaluating social policies, discarding those that don’t work, and boosting those that do, government can have a far greater impact on reducing poverty. So an experimenting society is likely to end up a more equal society.

Scepticism isn’t the enemy of optimism: it’s the channel through which our desire to solve big problems translates into real results.

Given the chance, randomistas can deliver a more equal world, one coin toss at a time.

[1] Mhurchu, C.N., Gorton, D., Turley, M., Jiang, Y., Michie, J., Maddison, R. and Hattie, J., 2013. Effects of a free school breakfast programme on children's attendance, academic achievement and short-term hunger: results from a stepped-wedge, cluster randomised controlled trial. Journal of Epidemiology & Community Health, 67(3), pp.257-264.

[2] For a recent discussion of this challenge in the context of social programs, see Arnold Foundation, ‘How to solve U.S. social problems when most rigorous program evaluations find disappointing effects (part one in a series)’, March 2018.

Power to Persuade17 April 2018