The article provides an overview of sites containing tens of thousands of datasets in the public domain. The datasets presented on these resources cover such areas as healthcare, geography, sociology, security, transport, and many others.
Google Cloud Public Datasets
Google Cloud contains more than a hundred datasets hosted in BigQuery and cloud storage. The datasets are taken from various sources, such as GitHub, the US Census Bureau, NASA, BitCoin and many others.
Amazon Web Services Open Data Registry
Amazon Web Services allows you to download datasets or explore them in the Elastic Compute Cloud. Open Data Registry is part of the AWS Public Dataset program aimed at democratizing data access.
Data.gov it is the main repository of open datasets of the US government. Most datasets are publicly available, but the rest require permissions to download them. The data presented on the website relate to climate, agriculture and energy.
Kaggle presents more than 23,000 datasets in a wide range of topics-from healthcare to cartoons. Kaggle datasets used in competitions are often more detailed than publicly available datasets.
UCI Machine Learning Repository
The oldest data source, operating since 1987. UCI datasets are great for machine learning due to the ability to configure download parameters.
Global Health Observatory
A WHO data repository containing information on various infectious and non-communicable diseases, mental disorders and medicines.
NASA datasets containing information about the Earth’s atmosphere, oceans, cryosphere, and solar flares. Earthdata has tools for processing, categorizing, searching, and visualizing data.