Alternative Data News and Insight

All news

Robot Etiquette: web scraping and the law

Whether you call it web scraping, web crawling or spidering, programmatic data access and extraction belongs to the realm of technologies that have outpaced developments within the world of legislation. In the following piece, we provide an introduction to legal theories and concepts that have proven relevant to web scraping activities, as well as some widely accepted guidelines for good bot behaviour.

Web scraping is defined as the extraction and copying of data from a website into a structured format using a computer program. These programs are interchangeably referred to as web scrapers, web crawlers, or bots.

There are good bots and bad bots, as well as – we assume – bots of somewhat ambiguous morality. According to one estimate,1 bots make up around one half of all internet traffic and most of them are malicious in nature. Bad bots steal competitor content, overload web servers, spam forums, and create phantom baskets on e-commerce websites.

Web scraping should only be used to access data that is publicly available. In other words, the information does not exist behind a paywall, a firewall, or any other type of code-based restriction. The benefit of automated access is that data collection can take place on a scale – and at a speed – that is not achievable through manual methods.

However, for all its convenience, web scraping usually entails more than just extracting raw data. Information on the web is messy and unstructured; it needs to be deduplicated, filtered, and integrated with one’s system of choice before it can become a subject for analysis.

Web scraping for Alternative Data

Request a Neudata trial

We'd like to know a bit more about you and your business, so we can deal with your request efficiently.
We take your privacy seriously and handle your personal data in line with our privacy statement.

We use your email address as part of allowing you access to your account and in order to provide you details with our products that might be of interest to you