New open source CSAM detection engine open for trial

I’ve only glanced at the predictions posted on this forum, but from what I see, the researcher is probably using a “softmax function”. If that is the case, then all the individual predictions will sum up to 100%.

So for example, the prediction could be 19% Child, 40% Adult, 41% Adult-naked. And then we just take whatever is the highest probability, so in this case it will be tagged as “Adult-naked”

My position is that such regimes simply have no place in this line of work.

Even if it can be tuned to know how to differentiate a photograph of a child, as opposed to a photograph of an adult with a hormone deficiency, or a hyperrealistic but fictional CG image, the room for false-positives and the effect that such scanning regimes would have on free expression would outweigh any valid applications, and will only divert focus towards environments where such regimes are ineffective or non-existent.

This is beginning to form the basis of a cat-and-mouse game that will ultimately undermine the very interests that grounded them.
Such interests are already undermined by groups such as ECPAT who make the unpopular, universally unsupported claim that CSAM could be construed to include materials that do not involve real children or abuse.

We’re playing with fire here, and placing an inordinate degree of faith in technology that is simply not valid for this type of undertaking.

That’s why I said there needs to be a human to validate any result that says CP was detected.

No decent ML practitioner would ever advocate for this to be used without human validation. Because they know they can never make it 100% accurate.

I have no comment on whether using this technology for CSAM detection is overall good or bad for society and free speech. I’m not qualified to speak on that.

What I can say though, is that whether we like it or not, companies and governments will use this tech in the future.

I believe Google and Facebook are already doing it. It’s only a matter of time until others follow suit.

Edit:
I think @terminus says it better:

My argument is that even with human validation, the use of AI in this realm still runs much of the same risks and cannot be trusted. My position is that, even with entire teams of analysts working around the clock to review and verify reports, the use of AI/ML detection schemes will still cause more harm than good.

At present, with image hashes for verified CSAM, where theoretically no human validation is needed, the workload of analysts and investigators who review and triage materials is still extremely high.
Adding AI detection and screening to all of that is just asking for trouble.

I think it’s quite clear that the end goal is to have AI to such a point where zero, or extremely limited, human validation is needed to tackle the issue of CSAM on the web. I cannot, in good conscience, support actions undertaken to advance this unfeasible goal.

Society has misplaced an inordinate degree of trust and reliance on AI as a solution to a variety of issues, and while much of these applications may be valid, the eradication of CSAM is simply not one of those.

I doubt it, but anything is possible, so long as there exists an issue which society has a valid concern over and there are skilled people who claim they can help resolve it.
My biggest concern is that technology companies will jump at the opportunity to help develop these ineffective solutions regimes because governments may be willing to throw money at them.

AI/ML image recognition and identification schemes are not new and have been incubating for much of the 2010s, and if such technologies had any meaningful validity in terms of detecting CSAM at a reasonable pace while mitigating the risks of false-positives and the overall trivislization of the issue of CSAM distribution, then I feel as though we’d see more than just Google or Facebook.
Facebook has been using such technologies to monitor more than just CSAM, while Google’s implementation of such technology merits further review.
Google seems to only omit links/images to sites with, or of, known CSAM imagery from their search algorithms, and from what I’ve been able to read, the same detection algorithms used in their indexing systems are the same used for monitoring/scanning of Google Drive/Gmail.

I understand you may not wish to argue over this, but none of my points are backed up by passion or idealism. They’re reasonable, fact-based interpretations and assumptions that have long aligned with the consensus of those who are familiar with the situation.

While I can’t give an educated answer on the net good/bad of adopting ML for CSAM detection, I can at least speak about my personal viewpoint.

I am staunchly opposed to fully automating anything that has the potential to infringe on any person’s rights or freedoms. I think on that point, we both agree.

It’s possible that there are more companies using it, they’re just not saying it out in the open.

You can also be right in that maybe adoption is low at the moment.

But like I said, I think it’s just a matter of time. The ML technology from 3 years ago is already considered outdated, let alone 2010 tech. They’re even making ML that replaces the ML engineers…

@Chie is right. Even with humans scanning and eliminating the false positives, who will be looking for false negatives? That means real CSAM can easily sneak past. And when the vulnerabilities are known, criminals will find new and creative ways to avoid the ML scans.

This reminds me of the 10 billion false dmca notices that google receives every year. My theory is they designed the whole thing around pictures of little lupe and went from there, because they were all detected as regular porn and there’s a lot of people out there who still think she was underage.

I wonder how it would handle the results of children stock photos used for clothing products sold on Amazon, Aliexpress, Alibaba, Ebay, etc.

Children being forced into modeling is a thing that happens, unfortunately. But it is not easy to know. What is done about such cases? What is done about the typical parent (or doll enthusiast) just browsing and shopping for clothes coming across some scantily clothed models on occasion, such as mermaid costumes, anime/cartoon costumes, swimwear, and sleepwear, that depict a child model. I wonder.

I think this is something that should be handled on a case-by-case basis under several educated human observers, rather than automation.

I’m pretty sure almost all of my dolls would fall under Child, even the bustier ones with youthful faces, and even though none of the dolls are actually a child. This system is already flawed by that notion alone that it cannot distinguish a sculpture from a human (nor do I expect it to). I won’t bother using it.