A major AI training data set contains millions of examples of personal data

July 18, 2025
Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found. Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from…

The Download: how to run an LLM, and a history of “three-parent babies”

July 18, 2025
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. How to run an LLM on your laptop In the early days of large language models, there was a high barrier to entry: it used to be impossible to run anything useful on…