Andy Greenberg reported that comp-sci researchers have figured out how to crack the code (pun very intended) of machine learning algorithms. I don’t usually get excited about tech on its own, but this is very cool:

“In a paper they released earlier this month titled ‘Stealing Machine Learning Models via Prediction APIs,’ a team of computer scientists at Cornell Tech, the Swiss institute EPFL in Lausanne, and the University of North Carolina detail how they were able to reverse engineer machine learning-trained AIs based only on sending them queries and analyzing the responses. By training their own AI with the target AI’s output, they found they could produce software that was able to predict with near-100% accuracy the responses of the AI they’d cloned, sometimes after a few thousand or even just hundreds of queries.”

There are some caveats to add, mainly that more complex algorithms with more opaque results would be harder to duplicate via this technique.

The approach is genius. Maciej Ceglowski pithily summarized machine learning like this in a recent talk: “You train a computer on lots of data, and it learns to recognize structure.” Algorithms can be really damn good at pattern-matching. This reverse-engineering process just leverages that in the opposite direction.

I’m excited to see this play out in the news over the next few years, as the reverse-engineering capabilities get more sophisticated. Will there be lawsuits? (I hope there are lawsuits.) Will there be mudslinging on Twitter? (Always.)

There are also journalistic possibilities, for exposing the inner workings of the algorithms that increasingly determine the shape of our lives. Should be fun!


Header photo by Erik Charlton.