Technology · July 21, 2025

The Download: how your data is being used to train AI, and why chatbots aren’t doctors

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

A major AI training data set contains millions of examples of personal data

Millions of images of passports, credit cards, birth certificates, and other documents containing personally identifiable information are likely included in one of the biggest open-source AI training sets, new research has found.

Thousands of images—including identifiable faces—were found in a small subset of DataComp CommonPool, a major AI training set for image generation scraped from the web. Because the researchers audited just 0.1% of CommonPool’s data, they estimate that the real number of images containing personally identifiable information, including faces and identity documents, is in the hundreds of millions. 

The bottom line? Anything you put online can be and probably has been scraped. Read the full story.

—Eileen Guo

AI companies have stopped warning you that their chatbots aren’t doctors

AI companies have now mostly abandoned the once-standard practice of including medical disclaimers and warnings in response to health questions, new research has found. In fact, many leading AI models will now not only answer health questions but even ask follow-ups and attempt a diagnosis.

Such disclaimers serve an important reminder to people asking AI about everything from eating disorders to cancer diagnoses, the authors say, and their absence means that users of AI are more likely to trust unsafe medical advice. Read the full story.

—James O’Donnell

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Hackers exploited a flaw in Microsoft’s software to attack government agencies
Engineers across the world are racing to mitigate the risk it poses. (Bloomberg $)
+ The attack hones in on servers housed within an organization, not the cloud. (WP $) 

2 The French government has launched a criminal probe into X
It’s investigating the company’s recommendation algorithm—but X isn’t cooperating. (FT $)
+ X says French lawmaker Eric Bothorel has accused it of manipulating its algorithm for foreign interference purposes. (Reuters) 

3 Trump aides explored ending contracts with SpaceX
But they quickly found most of them are vital to the Defense Department and NASA. (WSJ $)
+ But that doesn’t mean it’s smooth sailing for SpaceX right now. (NY Mag $)
+ Rivals are rising to challenge the dominance of SpaceX. (MIT Technology Review)

4 Meta has refused to sign the EU’s AI code of practice
Its new global affairs chief claims the rules with throttle growth. (CNBC)
+ The code is voluntary—but declining to sign it sends a clear message. (Bloomberg $)

5 A Polish programmer beat an OpenAI model in a coding competition
But only narrowly. (Ars Technica)
+ The second wave of AI coding is here. (MIT Technology Review)

6 Nigeria has dreams of becoming a major digital worker hub
The rise of AI means there’s less outsourcing work to go round. (Rest of World)
+ What Africa needs to do to become a major AI player. (MIT Technology Review)

7 Microsoft is building a digital twin of the Notre-Dame Cathedral
The replica can help support its ongoing maintenance, apparently. (Reuters)

8 How funny is AI, really?
Not all senses of humor are made equal. (Undark)
+ What happened when 20 comedians got AI to write their routines. (MIT Technology Review)

9 What it’s like to forge a friendship with an AI
Student MJ Cocking found the experience incredibly helpful. (NYT $)
+ But chatbots can also fuel vulnerable people’s dangerous delusions. (WSJ $)
+ The AI relationship revolution is already here. (MIT Technology Review)

10 Work has begun on the first space-based gravitational wave detector
The waves are triggered when massive objects like black holes collide. (IEEE Spectrum)
+ How the Rubin Observatory will help us understand dark matter and dark energy. (MIT Technology Review)

Quote of the day

“There was just no way I was going to make it through four years of this.”

—Egan Reich, a former worker in the US Department of Labor, explains why he accepted the agency’s second deferred resignation offer in April after DOGE’s rollout, Insider reports.

One more thing

The world is moving closer to a new cold war fought with authoritarian tech

A cold war is brewing between the world’s autocracies and democracies—and technology is fueling it.

Authoritarian states are following China’s lead and are trending toward more digital rights abuses by increasing the mass digital surveillance of citizens, censorship, and controls on individual expression.

And while democracies also use massive amounts of surveillance technology, it’s the tech trade relationships between authoritarian countries that’s enabling the rise of digitally enabled social control. Read the full story.

—Tate Ryan-Mosley

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)+ I need to sign up for Minneapolis’ annual cat tour immediately.
+ What are the odds? This mother has had four babies, all born on July 7 in different years.
+ Not content with being a rap legend, Snoop Dogg has become a co-owner of a Welsh soccer club.
+ Appetite for Destruction, Guns n’ Roses’ outrageous debut album, was released on this day 38 years ago.

About The Author