How much does alternative data cost? In our Getting the Price Right series, we leverage our unique position in the industry and our proprietary data on pricing to shed some light on this topic. This 3Q22 update of our 2Q20 and 2Q19 analysis provides fresh insight into the state of alternative data pricing.
The cost of using alternative data remains a frequently addressed topic amongst both data buyers and sellers.
By analysing some of the metadata we have compiled on datasets, we can provide insights on the following frequently asked questions:
- How much do datasets cost?
- Which factors influence the price of a dataset?
- Are funds willing to pay the price?
In this 3Q22 update, we:
- Refresh our data to find out whether the answers to the above questions have changed since 2Q20
- Provide additional analysis on the changes we’ve seen in pricing trends.
HOW MUCH DO DATASETS COST?
A constant trend: lower-cost datasets dominate
As mentioned in our original analysis of alternative data pricing, a common misconception is that useful alternative data is overwhelmingly expensive. In reality, datasets are more likely to fall within lower price brackets ($25k up to $150k per year) than the most expensive price brackets (more than $150k per year).
The chart below shows the number of Neudata-covered datasets available within each annual subscription price bracket. This demonstrates the dominance of lower-cost datasets.
Number of datasets by annual subscription price (USD) as of September 2022
So, what does the data tell us about trends in pricing?
The chart below demonstrates the changes we’ve seen in pricing based on the percentage of datasets within each price bracket over time. The dominance of datasets priced below $150k per year is not a new trend, but simply a continuation of the state of pricing in 2018.
% of datasets by annual subscription price (USD) – a comparison over time
General trends over time have, however shown an increase in datasets that are free or less than $25k per year. Additionally, we have seen a continued reduction in datasets priced above $500k per year.
Notably, compared with May 2020, we have seen an increase in datasets priced between $51k and $150k per year. Previously, we had seen a downward trend for datasets in this bracket. The increase here could be explained by the significant reduction in the number of datasets at higher price points. Moreover, it is likely this is a price point that providers have maintained their pricing at, which could indicate what clients are willing to pay for certain data types. Similarly, the proportion of datasets within the $26k-50k price bracket has remained largely unchanged.
Percentage point change in datasets per annual subscription price (USD) bracket from May 2020 to September 2022
The downtrend in prices is supported by what we are seeing in the market more broadly: due to the demand for less expensive datasets, we believe there is a trend towards vendors offering data at lower price points. This is facilitated by the development of different approaches to packaging the data.
These pricing trends could still change as different types of investors (with different data budgets) enter the alternative data market. However, the changes we’ve seen since 2020 could demonstrate the effectiveness of providing data at a lower price point in a different form of delivery (a prediction we made in our 2020 analysis) – especially with the entry of more fundamental clients with lower budgets than quants. Whilst pricing has decreased overall, it appears a balance is starting to be struck between affordability and demand.
Lastly, as an aside, readers should be aware it is possible the growth in the proportion of free datasets over time is a consequence of our research team’s efforts to grow our coverage of free datasets on the platform (in response to client interest in free data).
To summarise our outlook, below we have rounded up some of the key factors we view as having the potential to influence alternative data pricing in the future.
Key factors with the potential to influence alternative data pricing
Tiered pricing models
We believe the topic of tiered pricing models is worthy of further discussion given this has become a consolidated architecture for pricing datasets.
Approaches to tiered pricing
A tiered pricing model gives a provider the opportunity to continue to sell its ‘full fat’ product at the highest price point. The provider can then offer cuts or restricted versions of its data at reduced rates, which is mutually beneficial for clients with restricted needs e.g. the need for coverage on a limited number of companies. By doing so, providers can cater to the needs of investment clients with a more diverse range of budgets.
Another benefit for data providers taking this tiered approach is the ability to sell products at lower price points in a way that avoids explicit price discrimination between different types of investors – unlike a fundamental vs quant or AUM-based approach.
WHICH FACTORS INFLUENCE THE PRICE OF A DATASET?
Having looked at the overall state of pricing in the alternative data market, we will consider some of the characteristics that may influence the price of a dataset.
By breaking our data down by feed type, we can see how annual subscription prices vary between the different categories.
% of datasets by annual subscription price (USD) and feed type as of September 2022
The data shows that report-based offerings tend to cost less than other data feed types. This remains consistent with our analysis in 2020, which makes sense given the limited applications of reports containing data vs data feeds. Unstructured (raw) data has the highest proportion of $500k+ per year pricing. Again, this is logical given the likelihood that a provider’s ‘full fat’ product is often its raw data, which appeals to quant funds with often higher budgets.
By breaking our data down by dataset type, we can see how annual subscription prices vary between the different types.
The data shows that a large proportion of higher-cost datasets (over $150k per year) fall within the transactional, location, and web and app tracking categories.
To us, the presence of these data categories makes sense given data providers operating in these three categories typically collect data on vast panels (often comprising millions of users). This requires a significant amount of work from the provider in terms of cleaning the data and preparing it for use by the investment world. Many of these datasets also rely on exclusive partnerships, which may also impact the pricing.
% of datasets by annual subscription price (USD) and dataset type as of September 2022
Transactional data’s appearance is also unsurprising, given credit and debit card data vendors have, overall, been incredibly successful over the past decade. As such, investors clearly find value in this type of data.
That said, we have continued to see some reduction in the pricing of transactional datasets as competition grows in the space. This may also account for the number of transactional providers that have produced lower-cost products, such as aggregated data feeds, for clients that can’t afford, or don’t require, transactional offerings in their entirety.
Of these dataset types, ESG and satellite and aerial data have the highest proportion of datasets priced below $25k per year. This may be since many of the datasets within these categories can use data from free publicly available sources. Data providers in these categories are also likely to sell data to clients beyond the investment vertical and may price data to make it accessible to government and academic clients.
Asset class, coverage, history, frequency and API availability have minimal impact
We would expect that, on average, datasets with a longer data history, higher frequency and wider coverage would cost more than datasets inferior in these respects. However, our data over the years has consistently suggested otherwise.
Relationship between dataset subscription price score and data history, frequency and coverage as of September 2022
The above chart looks at the average data history, frequency and universe coverage of datasets for each price bracket. The data is based on Neudata’s dataset scoring system, which can be found at the bottom of all Scout reports. We can see from the data how datasets with a subscription price score of 10 (i.e. free datasets) compare, on average, to datasets with a subscription price score of 1 (datasets that cost over $1m per year), across all three metrics.
Knowing how to price a product is a challenge, as is knowing how to improve an offering to make it more valuable. This is something we at Neudata often see data providers seeking advice on. Providers may ask if it is worth offering more history, increasing delivery frequency or expanding the coverage universe. Whilst most data users would agree that providers improving on any of these attributes is a positive move, our data suggests that improvements in these areas are not justifiable adjustments when it comes to pricing changes.
Instead, this data has consistently shown over time that pricing is more a product of a dataset’s overall value to an investor. This is, naturally, something that can be difficult to measure.
We may find some answers with the data at hand. Notably, from a mid-price point we start to see a downward trend for universe coverage: as pricing increases the universe coverage decreases slightly. This may demonstrate that sector- or universe-specific datasets can sometimes provide more value to investors.
ARE FUNDS WILLING TO PAY THE PRICE?
Million-dollar datasets: transactional (still) comes out on top
Annual budgets for alternative data can often exceed the million-dollar mark. With this in mind, we consider the data categories where investors are most likely to spend $1m+ per year on a single dataset.
The chart below looks at datasets that can be priced in some form at $1m+ per year. Breaking down these datasets by price indicates that data providers within certain categories are more likely to be able to justify charging very high prices for their data. Across our years of pricing analysis, transactional data has consistently been the dominant dataset type in the $1m+ price range.
Dataset type breakdown of $1m+/year datasets as of September 2022
Whilst dataset types can be a factor towards the value of higher priced datasets, the regions a dataset covers may also be of significance. The chart below shows a breakdown of the $1m+ datasets by region.
Dataset region breakdown of $1m+/year datasets as of September 2022
The majority of these more expensive datasets cover equities in North America. This is likely due to the significant adoption of alternative data by US investors. This also demonstrates that in regions where data is harder to come by, scarcity is not always indicative of high price points if investors cannot find enough value in the data.
Based on our analysis, we understand that:
1.Overall value is more important than any single attribute in pricing a dataset
The overall value of a dataset to an investor is key. Based on our empirical analysis, individual dataset attributes, such as asset classes covered, history, frequency or API availability, have little impact on price.
2. Lower cost datasets are more common than you may think and we expect a continuing downward trend in pricing
Datasets at lower price points outnumber the more expensive datasets and we expect this to remain the case. We continue to see a reduction in datasets priced above $150k per year.
3. Investors are more willing to pay very high prices for data within certain categories
Investors are willing to pay $1m+ per year for certain types of data, particularly transactional data. Additionally, the most expensive alternative datasets often cover equities in North America.