Depends upon what false negative rate you're willing to tolerate. ;) And I don't...

YeGoblynQueenne · on Aug 30, 2019

Machine learning techniques are going to be absolutely awful in detecting something like this, the reason being it's exceedingly rare (at least I'm guessing it is; if we're talking about child sexual abuse by one's own parents, it sure sounds extremely unlikely- but even child abuse in general is probably rare [1]). Machine learning systems are awful at identifying rare events. Like the OP seems to suggest, the false positive rate would most likely be very high.

"Spooky" machine learning results happen when a correlation is abundant in a dataset [2]. Otherwise, machine learning techniques will probably miss it altogether.

______________

[1] Quick online search: https://www.inquirer.com/philly/blogs/healthy_kids/What-is-t...

[2] The archetypal spooky machine learning story is surely the one about Target sending baby item coupos to a girl in high school before her father knowing she was pregnant:

https://www.forbes.com/sites/kashmirhill/2012/02/16/how-targ...

dragonwriter · on Aug 30, 2019

> if we're talking about child sexual abuse by one's own parents, it sure sounds extremely unlikely-

Child sexual abuse isn't extremely rare and familial abuse is a very large minority of child sex abuse.

mlyle · on Aug 30, 2019

Humans are awful at rare events and vigilance tasks, too. That's part of why we're seeing machine vision and machine learning starting to outperform humans in e.g. grading radiology screening scans.

The total incidence of child abuse of all types from infancy to adulthood is on the order of 1 in 3. This is not terrifically rare-- it's of higher prevalence than pregnancy and of positive screening events.

A much bigger concern is non-causative correlations. It'd be pretty easy to train ML to be racist or look for e.g. indicators of class, which are correlates of abuse.

As to false positive rates-- you can pick your false positive rate to be whatever you want it to be, by twiddling the threshold for a positive result. I'm not sure false positives are of that great of a concern, if the output from a system is a notification to school administrators that they may want to keep an eye out for this student.

dragonwriter · on Aug 30, 2019

> But this type of thing seems like the exact kind of spooky correlation that ML is good at spotting compared to humans.

How? Particularly, where do you get training data at the required scale?

mlyle · on Aug 30, 2019

You take samples of hundreds or thousands of past students' schoolwork, e.g. submissions of essays for standardized tests.

You survey those kids in adulthood about whether and how they were subject to abuse and other types of relevant adversity.

You attempt to control the data so that you don't just latch onto other correlates of abuse (e.g. social class).