Risk prediction tools in child welfare contexts: the devil in the detail

This is a guest blog post by Emily Keddell, Senior Lecturer at the University of Otago in  New Zealand. Emily has published and blogged about the New Zealand government’s attempts to pilot predictive risk modelling in child protection services. She is a member of the Reimagining Social Work collective. Emily is happy to respond to any questions or comments on the blog post, so please feel free to add a comment below.

A recent study in England found that there were significant regional variations in care proceedings in the court system (Harwin, Alrouh, Bedson, & Broadhurst, 2018). A study in New Zealand found that the site office of the national child protection service was the 4th most predictive variable out of 15 for abuse substantiation, even after poverty had been controlled for (Wilson, Tumen, Ota, & Simmers, 2015). Another UK study found large variations in contact with the child protection system relating to levels of deprivation, but also that an ‘inverse intervention law’ seemed to be operating, where poor children in neighbourhoods surrounded by a larger wealthier neighbourhood were much more likely to have contact with the child protection system than equally deprived children surrounded by a highly deprived neighbourhood (Bywaters et al., 2015). Others point out that decisions to notify, substantiate and investigate child abuse can be shaped by all sorts of factors such as the values and beliefs of the social worker, perceptions of risk and professional cultures (Keddell, 2014). Yet another study found that the variations in Indigenous children’s system contact was more related to the variable provision of local prevention services than differences in case characteristics (Fluke, Chabot, Fallon, MacLaurin, & Blackstock, 2010).  

Why do these examples matter when considering algorithmic forms of decision-making in child welfare contexts?  Surely an algorithm could correct some of these inconsistencies? Almost the opposite. Because the data used to construct algorithmic and statistical models is built from data sources that rely on the administrative recording of factors such as decisions to substantiate, reports to child protection services or legal orders, the resulting predictions reflect the many elements that contribute to variations in system contacts (Keddell, 2016). Importantly, in none of the examples given did system contact necessarily reflect equitably the case characteristics of the actual families involved – all were shaped by other factors. Statistical tools replicate the many ways that these variable social processes and demand/supply factors shape child protection system contact, which may only be tangentially related to the actual abuse of children. Hardly any of the decisions that become data points are objective outcomes, yet the process of algorithmic computation lends them the perception of objective reality –  it reifies them (Keddell, 2015). Is a child protection office in a particular location more likely to substantiate cases of domestic violence as child abuse than another? That means that data derived from that office is likely to assign a higher risk score to children witnessing DV from that area in the future, and not those in the same situation in other areas.  Is there class or racial bias in the populations investigated for child abuse? Then the use of, for example, parental contact with child protection systems (as was the case in one New Zealand study that found it was the 3rd most predictive variable) is just as likely to reinforce existing ethnic inequalities in the system rather than reduce them (Wilson, 2015). This is especially so as at the time today’s adults were children, a ministerial inquiry found that the cause of ethnic disproportionality in New Zealand’s child welfare system was due to “forms of cultural racism … that result in the values and lifestyle of the dominant group being regarded as superior to those of other groups, especially Maori” (Ministerial Advisory Committee, 1988, p.9). It is these subtle ways that both bias and arbitrariness – equally destructive to notions of fairness –  become ‘baked in’ to administrative data, and will consistently over-assign risk to some people while understating it for others in algorithmic computations (Kirchner, 2015).  

While there are many possible causes of variability in training data as noted above, the charge of ethnic or class bias in data in the child welfare context is intimately tied up with what is often referred to as the ‘risk -bias’ debate in child protection (Cram, Gulliver, Ota, & Wilson, 2015; Detlaff, 2013; Drake et al., 2011). It goes like this: do children from poor and ethnic minority backgrounds have disproportionate contact with the child protection system due to greater exposure to known risk factors that pushes up true incidence of abuse (through increasing family stress), or is it caused by a biased system that disproportionately surveils, investigates and criminalises the poor? (Boyd, 2014; Eubanks, 2017). This question has been the subject of many nuanced research projects, primarily in the US (see Detlaff, 2013). While there is good evidence to suggest that the largest cause is the former in the US, caseworker and exposure biases also play a significant part (Ards et al., 2012). In short, both risk and bias are likely at play, though the relative weighting of each is likely to be different in different national and state contexts. Certainly there is evidence of continuing bias that affects who is investigated and how similar parenting behaviours get viewed when the parent is poor, compared to when they are not (Roberts & Sangoi, 2018; Wexler, 2018). Why does this matter for algorithms? Well, a proponent might argue that if the disproportionate identification of poorer groups (and the racialisation that occurs when poorer people are targeted in any way) reflects higher incidence, that while unfortunate, is reflecting a real heightened risk of harm to children. It’s not the child protection system’s job to address the structural inequalities causing that risk, but to respond to the children here right now that need protection.  So the algorithm, while it has disparate impact, is fair (Barocas & Selbst, 2016).

 The problem with that argument for the purposes of algorithmic prediction here is two-fold. For one, even if true risk is heightened, that is likely to be far overstated in child protection system data that is also inflated by bias. Secondly, the data used to create algorithms draws on data from multiple sources that could all contain bias (not just the child protection data), so the likelihood that the result overstates differences in true incidence, and deems poorer people very high risk relative to others, becomes even more pronounced. For example, in the Wilson (2015) study above, the use of parental criminal justice history combined with time spent on a public welfare benefit as predictor variables will both disproportionally push up those in poverty and within that group, Māori and Pacific people, in ways that outstrip true risk, because, for example, Māori receive harsher sentences for the same crimes than Pākeha or white offenders (Jones 2016). Many US examples are similar. So when these data are used as a predictor, it is again creeping bias into an algorithmic model that will portray the increased true risk associated with poverty as far higher than its actual effect on incidence (see also Keddell, 2016). Of increasing interest is who is missing altogether from the data sources used – when there are groups (affluent white people) who have hardly any contact at all with administrative systems, the ledger becomes even more unbalanced. The resulting algorithm will not only over-identify poorer people, but will miss risk amongst more affluent populations. And while at a technical level one might say ‘at least we are getting it partly right for one sector of the population’, from a justice perspective, the relative intrusion this disparity creates in the lives of those least able to seek redress, compared to the escape from scrutiny that wealthier populations get leads to important issues of equity – is it fair that some people are ascribed high risk status while others escape scrutiny altogether? Big administrative datasets essentially ‘go easier’ on some people compared to others. 

Another argument is that despite the known issues with algorithms, they are better than current child protection decisions, which can also be arbitrary or biased (Cuccaro-Alamin, Foust, Vaithianathan, & Putnam-Hornstein, 2017). Poor human decision-making is especially marked when information is vague and there isn’t much of it, there are time pressures, and when the decision maker is not exposed to the long term outcomes of their decisions, in short, in the contexts of most child protection intake centres (Saltiel, 2016). A person calls with a vague concern about a child and the intake social worker has to make a triage decision about whether to take on the case, not take it on, or refer it to another service. In this information-poor environment, with time pressure, there are real threats to sound human decision-making, including the use of stereotypical heuristics and cognitive biases (Kahneman, 2011). In this context, statistical risk prediction tools are portrayed as a method of enabling fast, properly weighted decisions that can take account of all the factors that might contribute to a poor outcome (such as re-reports within a specific time period or placement in care). They also use information that the intake worker already has access to because of their legal mandate. For example, the Allegheny Family Safety Tool is based on this justification (Vaithianathan, Jiang, Maloney, Nand, & Putnam-Hornstein, 2017). Arguments such as these rely on research about the increased accuracy of statistical and actuarial tools compared to human decision-making, teh calims of which are often made in sweeping terms. This research is actually fairly mixed, with Kahnemann (2011) concluding that of the two hundred or so studies “about 60% have shown significantly better accuracy for the algorithms” (p.223). Most of these studies have more concrete outcomes to predict than those in child protection contexts, and different types of decisions and decision-making contexts affect which method is more accurate. This makes real comparisons difficult (Bartelink, van Yperen, & ten Berge, 2015). What this level of mixed outcomes should prompt is the need to compare the accuracy of any proposed algorithmic model with current human decision-makers in the specific context under examination before accuracy claims can really be made. Only one study I have seen – the Rea et al study, (Rea & Erasmus, 2017) compared the accuracy of the statistical model they were proposing, with existing decision-makers in the context they were studying (intake workers of Child youth and Family, the statutory service in Aotearoa New Zealand).  What they found was that the humans were accurate 60% of the time, and the model accurate 66% of the time. A small but nevertheless interesting improvement. However, to compare this to an earlier NZ study, its positive predictive accuracy (the proportion of those it identified as high risk that went on to have the predicted outcome – in this case a substantiation of child abuse over the five year period) was only 25% accurate at the top decile of risk, and at the top 1% of risk 42% (over the next five years). To spell it out, the human decision-makers in the later study were far more accurate than the statistical methods used in the earlier one, but not as accurate as the methods used in the later one. One wonders if many developers don’t want to do comparative studies because they will find, given that the data they are using is essentially decision data, that there won’t be much difference.   

 Such findings have led to increased tensions around the transparency of models. For example, the recent controversy about the Allegheny family safety tool raised by Virginia Eubanks in her book ‘Automating Inequality’. She points out that the tool reinforces poverty profiling mechanisms and existing inequities in the child protection system, essentially punishing poorer people by assigning them a risk score they cannot escape. This led to a terse statement refuting her points, and her reply (https://virginia-eubanks.com/2018/02/16/a-response-to-allegheny-county-dhs/). What is interesting is that the documents relating to that model have not given the positive predictive accuracy rates or their variable weightings. While they have commissioned an ethics review, the weighing up of harms and benefits is impossible without that information, and the reviewers take as a given that decisions made in this way are de facto more accurate than humans (Dare & Gambrill, 2016). The conclusion of the reviewers, that the tool makes decision-making more ‘transparent’ is also a contestable conclusion. A statistical tool may make the reason for the decision more obvious – because of a high risk score – but that is not transparency. If anything, where the true reason for a decision is based on the computations of a complex algorithm, this makes decision reasoning much less, rather than more transparent. Some ethics bodies are calling for a basic ethical right for end users to have the ‘right to an explanation’ where algorithms are used. Ensuring this in the child protection context is fraught when there are so many questions about the data and how is it analysed, and the relative powerlessness of people involved as clients in the child protection system which makes challenging decisions difficult (for example, see the General Data Protection Regulation).   

 Naranayan (2018) argues we need more focus on “connecting technical issues with theories of justice”. One way we can consider this, in addition to the justice issues  outlined above, is in relation to the use of predictive analytics to identify individuals at risk of a particular outcome based on their population-wide probability. That is, what a high risk score is saying about any given individual is: ‘you are so similar to others in a group that we have defined the criteria for entry to, we are prepared to override your rights to be considered on the basis of your personal actions thus far, and instead will intervene based on the statistical liklihood of your future behaviour’. But one’s legal rights are individual, not collective. How statistically similar or dissimilar I am to someone else should not affect my right to fair treatment in the legal context that is a child protection investigation. My rights are individual. In algorithmic technical terms, when this identification is wrong, the problem is one of false positives – some people identified as high risk will not go on to have that outcome. When an algorithm has high false positives and false negative rates, the technical fix is focussed on improving the type and scope of data. For example, after the cancellation of the Eckerd tool in Chicago child protection which tended to overestimate risk, one of the solutions proposed was to reverse the current practise of  expunging unsubstantiated cases from official records (and data). From a technical statistical perspective, removing unsubstantiated cases introduced statistical bias into the data by excluding cases that were investigated, abuse was not found, but nevertheless may have had other similar characteristics as those that were. Their exclusion damages the learning ability of the algorithm, as it cannot make finegrained, more accurate differentiations between different cases. The fix proposed is to reverse expungement. Nowhere is the question asked: why have expungement? In some states, expungement exists to protect people’s right to future due process – so that an earlier child abuse investigation does not taint future notifications. This method is used in some states to reduce racial and legal biases. The principle is not only an issue in relation to expunged cases, but also those cases never notified to child protection services, or notified but not investigated – neither of these situations means abuse hasn’t occurred, just that no one saw and reported it. It is for these reasons the ‘moneyball’ analogy is a bad one. In baseball, all outcomes are easily observable and categorically definite. Player x scored a home run, and everybody saw it. When it comes to child abuse, neither the action itself is easily defined, nor who sees it. A child left in a bath slips and hits their head getting out. The parent was distracted by another child. Is this neglect? And what if no one saw it and reported it? Did it happen? Not in the data it didn’t. And what about if this situation occurred in a poor neighborhood as opposed to a wealthy whiter one? Will it be viewed differently, and hence get into the data differently? In these ways the data used to inform such models are incorrigibly suspect. Attempts to improve it lead to increasingly intrusive data use and challenges to legal equity. When such tools over-identify those least able to refute their ‘high risk’ label, we should all be concerned.   

 Image credit: Ars Electronica


Ards, S. D., Myers Jr, S. L., Ray, P., Kim, H.-E., Monroe, K., & Arteaga, I. (2012). Racialized perceptions and child neglect. Children and Youth Services Review, 34(8), 1480-1491. doi: http://dx.doi.org/10.1016/j.childyouth.2012.03.018 

Barocas, S. & Andrew D. (2016). Big Data’s Disparate Impact. California Law Review 671(671).

Bartelink, C., van Yperen, T. A., & ten Berge, I. J. (2015). Deciding on child maltreatment: A literature review on methods that improve decision-making. Child Abuse & Neglect, 49(Supplement C), 142-153. doi: https://doi.org/10.1016/j.chiabu.2015.07.002 

Boyd, R. (2014). African american disproportionality and disparity in child welfare: Toward a comprehensive conceptual framework. Children and Youth Services Review, 37(0), 15-27. doi: http://dx.doi.org/10.1016/j.childyouth.2013.11.013 

Bywaters, P., Brady, G., Sparks, T., Bos, E., Bunting, L., Daniel, B., . . . Scourfield, J. (2015). Exploring inequities in child welfare and child protection services: Explaining the ‘inverse intervention law’. Children and Youth Services Review, 57, 98-105. doi: http://dx.doi.org/10.1016/j.childyouth.2015.07.017 

Cram, F., Gulliver, P., Ota, R., & Wilson, M. (2015). Understanding overrepresentation of indigenous children in child welfare data: An application of the drake risk and bias models. Child Maltreat, 20(3), 170-182. doi: 10.1177/1077559515580392 

Cuccaro-Alamin, S., Foust, R., Vaithianathan, R., & Putnam-Hornstein, E. (2017). Risk assessment and decision making in child protective services: Predictive risk modeling in context. Children and Youth Services Review, 79, 291-298. doi: http://dx.doi.org/10.1016/j.childyouth.2017.06.027 

Dare, T., & Gambril, E. (2016). Ethical analysis: Predictive risk models at call screening for allegheny county. US: Allegheny County Human Services. 

Detlaff, A. (2013). The evolving understanding of disproportionality in child welfare. In J. Korbin & R. Kruger (Eds.), The handbook of child maltreatment (pp. Pp149 – 170): Springer, Netherlands. 

Drake, B., Jolley, J. M., Lanier, P., Fluke, J., Barth, R. P., & Johnson-Reid, M. (2011). Racial bias in child protection? A comparison of competing explanations using national data. Pediatrics, 127, 471 – 478.  

Eubanks, V. (2017). Automating inequality: How high-tech tools profile, police and punish the poor. . New York: St. Martin’s Press. 

Fluke, J. D., Chabot, M., Fallon, B., MacLaurin, B., & Blackstock, C. (2010). Placement decisions and disparities among aboriginal groups: An application of the decision making ecology through multi-level analysis. Child Abuse & Neglect, 34(1), 57-69. doi: http://dx.doi.org/10.1016/j.chiabu.2009.08.009 

Harwin, J., Alrouh, B., Bedson, S., & Broadhurst, K. (2018). Care demand and regional variability in england: 2010/11 to 2016/17. Lancaster: Centre for Child and Family Justice Research, Lancaster University. 

Jones, B. (2016) Offending outcomes for Maori and non-Maori: an investigation of ethnic bias in the criminal justice syste. Evidence from a New Zealand birth cohort. Unpublished masters thesis, University of Canterbury. Retrieved from: https://ir.canterbury.ac.nz/bitstream/handle/10092/12607/Jones_MSc_2016.pdf?sequence=1 

Kahneman, D. (2011). Thinking: Fast and slow. US: Farrar, Strauss and Giroux. 

Keddell, E. (2014). Current debates on variability in child welfare decision-making: A selected literature review. Social Sciences, 3(4), 916 – 940. doi: doi:10.3390/socsci3040916 

Keddell, E. (2015). The ethics of predictive risk modelling in the Aotearoa/New Zealand child welfare context: child abuse prevention or neo-liberal tool? Critical Social Policy 35(1), 69 – 88.

Keddell, E. (2016). Substantiation, decision-making and risk prediction in child protection systems. Policy Quarterly, 12(2), 46.

Kirchner, L. (2015, Sept 6th) When discrimination is baked in to algorithms, The Atlantic. Retrieved from: https://www.theatlantic.com/business/archive/2015/09/discrimination-algorithms-disparate-impact/403969/

Ministerial Advisory Committee. (1988). PUAO-TE-ATA-TU (day break): the report of the ministerial advisory committee on a Maori perspective for the department of social welfare. Dept of Social Welfare. Wellington, New Zealand. 

Naranayan, A. (2018). 21 fairness definitions and their politics. Youtube: Arvine Naranayan. Retrieved from: https://www.youtube.com/watch?v=jIXIuYdnyyk.

Rea, D., & Erasmus, R. (2017). Report of the enhancing decision-making project. Wellington, nz, ministry of social development. Available at: https://mvcot.govt.nz/assets/uploads/oia-responses/report-of-the-enhancing-intake-decision-making-project.Pdf. 

Roberts, D., & Sangoi, L. (2018, March 26). Black families matter: How the child welfare system punishes poor families of color, Injustice Today. Retrieved from https://injusticetoday.com/black-families-matter-how-the-child-welfare-system-punishes-poor-families-of-color-33ad20e2882e 

Saltiel, D. (2016). Observing front line decision making in child protection, British Journal of Social Work, 15(46), 2104 – 2119.

Vaithianathan, R., Jiang, N., Maloney, T., Nand, P., & Putnam-Hornstein, E. (2017). Developing predictive risk models to support child maltreatment hotline screening decisions: Allegheny county methodology and implementation. 

Wexler, R. (2018, March 16). Poor kids end up in foster care because parents don’t get margin of error rich do, Opinion, Youth Today. Retrieved from https://youthtoday.org/2018/03/poor-kids-end-up-in-foster-care-because-parents-dont-get-margin-of-error-the-rich-do/ 

Wilson, M. L., Tumen, S., Ota, R., & Simmers, A. G. (2015). Predictive modeling: Potential application in prevention services. American Journal of Preventive Medicine, 48(5), 509-519. doi: http://dx.doi.org/10.1016/j.amepre.2014.12.003