When you sign up for a newsletter, make hotel reservations, or check out online, you probably assume that if you mistype your email address three times or change your mind and page out. If it does, it doesn’t matter. Nothing really happens until you hit the submit button, right? Well, maybe not. As with so many assumptions about the Web, that’s not always the case, according to new research: A surprising number of websites are collecting some or all of your data as you type it in digitally.
Researchers from KU Leuven, Radboud University, and University of Lausanne crawled and analyzed the top 100,000 websites, looking at scenarios in which a user is visiting a site in the European Union and a site from the United States. . They found that 1,844 websites collected an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Many sites appear not to intend to conduct data-logging, but to incorporate third-party marketing and analytics services that cause the behavior.
After crawling sites specifically for password leaks in May 2021, researchers also found 52 websites, including Russian tech giant Yandex, that third parties were collecting password data before accidentally submitting them. The group disclosed its findings at these sites, and since then all 52 cases have been resolved.
“If there’s a submit button on a form, the reasonable expectation is that it does something—that’s it when you click it,” says Gunnes Acker, a professor and researcher and one of the leaders in Radboud University’s Digital Security Group. Will submit your data.” of study. “We were very surprised by these results. We thought we were probably going to find a few hundred websites where your emails are aggregated before submitting, but this far exceeded our expectations.
The researchers, who will present their findings at the Usenix security conference in August, say they were inspired to investigate “leaked form” from media reports, particularly from Gizmodo., About third parties collecting form data regardless of submission status. They explain that, at its core, the behavior is similar to so-called keyloggers, which are usually malicious programs that log everything a target type. But on a mainstream top-1,000 site, users probably wouldn’t expect their information to be keylogging. And in practice, the researchers observed some changes in behavior. Some sites logged data keystroke by keystroke, but many captured complete submissions from one field when the user clicked on the next.
“In some cases, when you click on the next field, they collect the previous one, as if you click on the password field and they collect the email, or you just click anywhere and they immediately collect all information,” says Asuman Senol, a privacy and identity researcher at KU Leuven and one of the study’s co-authors. “We didn’t expect to find thousands of websites; and in the US, the numbers are actually much higher, which is interesting.”
Researchers say regional differences may be related to companies being more vigilant about user tracking and even potentially less integrating with third parties because of the EU’s General Data Protection Regulation. But they stress that this is just one possibility, and the study did not examine an explanation of the disparity.
Through a substantial effort to notify third parties and websites that collect data in this way, the researchers found that one explanation for some unexpected data collection lies in the challenge of separating the “submit” action from other user actions on the Web. may be related. Page. But the researchers emphasize that from a privacy standpoint, this is not a sufficient justification.
Since completing the paper, the group had also become aware of the Meta Pixel and TikTok Pixel, invisible marketing trackers that services embed on their websites to track users across the web and show them ads. Both claimed in their documentation that customers can turn on “automatic advanced matching”, which will trigger data collection when a user submits a form. In practice, however, the researchers found that these tracking pixels were capturing hashed email addresses, a fuzzy version of the email addresses used to identify web users on the platform, prior to submission. For US users, 8,438 sites are leaking data to Facebook’s parent company Meta via Pixel, and 7,379 sites may be affected for EU users. For the TikTok pixel, the group found 154 sites for US users and 147 for EU users.
The researchers filed a bug report with Meta on March 25, and the company immediately hired an engineer for the matter, but the group hasn’t heard an update since. Researchers notified TikTok on April 21—they recently discovered TikTok behavior—and haven’t heard back. Meta and TikTok did not immediately return Wired’s request for comment about the findings.
“The privacy risk for users is that they will be tracked even more efficiently; they can be tracked on different websites, in different sessions, on mobile and desktop,” says Acker. Such a useful identifier is because it is global, it is unique, it is constant. You can’t clear it like you clear your cookies. It’s a very powerful identifier.”
Acer also points out that, as tech companies look to phase out cookie-based tracking for privacy concerns, marketers and other analysts will rely more and more on static IDs like phone numbers and email addresses.
Since the findings suggest that deleting data in a form prior to submission may not be enough to protect itself from all collection, the researchers created a Firefox extension called Leak Inspector to detect fake form collections. And they say they hope their findings will raise awareness of the issue not only for regular web users, but also for website developers and administrators, who can continually check whether their own systems or those used by them are. Whether third parties are collecting data from the form without permission.
Leaked forms are just one more type of data collection to beware of in an already overcrowded online arena.
This story originally appeared on wired.com.