Data Users

‍Web scraping developments and risks

Some data owners have taken a very protective view

Mar 25, 2024

‍Web scraping developments and risks

Web-scraping is one of the most popular tools that data sellers use as an input to their product offerings. Whether that’s scraping news sites, social media platforms or individual company websites.

This year, what we’ve seen is that, with the acceleration of AI tools, some of the owners of those underlying sources have taken a very protective view around what is and isn’t allowed to be done with their content.

For example, we’ve seen some of the big media outlets, such as the New York Times, banning web crawlers from AI firms (on their sites), and then we’ve seen X and Reddit take action to limit web scrapers in terms of what they’re actually able to scrape today vs just a few months back.

There’s quite a significant knock-on impact risk for data sellers that rely on web-scraping and NLP – a lot of sentiment and human capital vendors, for example. And that risk doesn’t appear to be going away; it just appears to be getting larger over time.

There are two ways we think about how our world responds to this:

- On the data seller side, we’ll see most vendors do everything they can to make sure they’re not stepping over some of the clear red lines that have now been made. But there'll be some bad actors; that’s always going to be the case.

- On the buyer side, purchasers will need to carefully consider this ‘grey area’ between staying clear of those red lines and being a bad actor. And that’ll involve getting familiar with how vendors interact with, bypass, or overcome any of the newer anti-scraping measures. That's where we’re likely to find more layered and complicated compliance issues.

For more details on the legal issues around web-scraping you can find more information here:

Data scraping in 2024

The nascent regulatory landscape of web-scraping

If you'd like to know more about how Neudata can help, you can find more information on our website.

Blog suggestion

Suggest a topic for the Neudata blog

Suggest a blog topic