Introduction.

I am often asked by clients where can they find public datasets for analysis and for inclusion in their own data analytics. This brief list of eleven key sources shows the range of data available via some key national and international organisations. Most of this data is freely available and updated regularly as new data becomes available.

Some of these datasets are large, 300TB from one source alone.

11 Sources of Data

https://data.gov.uk/

Data provided by UK open data program. Includes government statistics on Economics, Finance, Health and Agriculture.

https://www.data.gov/

Source of data for the US Government open data; Divided into sections covering agriculture, finance, business. Contains over 180,000 datasets for public access.

 

http://www.census.gov/data.html

USA census data collected on population, detailed down to block level. Also provides tools for analysis or alternatively, bulk data download.

http://ec.europa.eu/eurostat/data/database

Eurostat, the European Statistics Organisation provides the European Union open data repository. Covers a huge range of topics including census data, business data, migration data, economy, agriculture and health.

Amazon Datasets

http://aws.amazon.com/datasets/

Amazon provides a range of open source data hosted on their S3 storage. The data is free, though processing charges are applied for computer processing on their EWS platforms.

Data available include Landsat satellite imagery, updated daily, also climate data, the million song collection of 28 music datasets, social media data, genomes data from the Human Genome Project.

http://opendata.cern.ch/?ln=en

CERN – the European Organisation for Particle Physics provide open data on a number of their experiments, for example the Large Hadron Collider has provided some 300 TB of data, some processed to make it suitable for schools and colleges.

http://data.worldbank.org/

The World Bank provide a huge range of development and economic data via their open data program. These often include easy to use software interfaces in addition to direct data download. One example is the World Development Indicators Database. You can extract data from this database easily using their software. I wrote a program to illustrate access to this dataset here: https://tendron.shinyapps.io/WorldBank1/

https://www.economicsnetwork.ac.uk/data_sets

An extensive range of economics datasets are listed on this website, including stock market data, government bond data and GDP data among a wide range of other economic and financial time series.

https://datamarket.azure.com/browse/data

A list of datasets available via Microsoft Azure, many of them are free, though not all.
Bankstats

The bank of England provide a large range of banking, monetary and financial statistics in the Statistical Interactive Database. Other data sets include forecasts for the UK economy and statistics on public finance and spending.
Economic and Financial Affairs (ECFIN)

European Economy data can be downloaded from the Europa portal site. The datasets are contained in the Economic and Financial Affairs Directorate site (ECFIN DG). The home page of the directorate is: http://ec.europa.eu/economy_finance/index_en.htm.

http://www.oecd.org/statistics/

On this site you will find a whole range of statistics for each of the 30 OECD countries, the euro area and the OECD as a whole. The statistics are arranged by topic group, including national Accounts, Finance, Agriculture, Development, International Trade, Labour, Prices, Public Management and Short-term Economic Statistics.

http://www.gdeltproject.org

Supported by Google Jigsaw, the GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organisations, counts, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

Summary

The above free datasets provide enormous quantities of data suitable for professional analysis. CERN, for example provide 300TB for their Large Hadron Collider experiment alone. Nevertheless, they also provide processed datasets which are suitable for school and college projects.

Alan Brown, Data Architect,