"/> Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms

2022-06-09    Share on: Twitter | Facebook | HackerNews | Reddit

Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms

"Looking for the key to unlocking valuable datasets? Dive into the world of Kaggle, UCI, and more as we unveil the best platforms for data enthusiasts."

There are several popular dataset-sharing platforms available that researchers, data scientists, and machine learning practitioners can utilize to access and share datasets. Here are some of the best dataset-sharing platforms:

Kaggle

Kaggle is a well-known platform for data science competitions, but it also provides a dataset repository where users can discover and share datasets. It offers a wide range of datasets in various domains, along with tools for data exploration and collaboration.

UCI Machine Learning Repository

The University of California, Irvine (UCI) hosts a repository of datasets specifically designed for machine learning research. It provides a diverse collection of datasets, including text, image, and time series data, covering a wide range of domains.

Google Dataset Search

Google Dataset Search is a search engine that specifically focuses on indexing datasets. It aggregates datasets from various sources on the web, making it easier to find publicly available datasets. It provides information about the dataset, including its description, author, and availability.

Data.gov

Data.gov is a U.S. government initiative that provides access to a wide range of datasets from different federal agencies. It offers datasets covering various domains such as health, climate, finance, transportation, and more. The platform aims to promote transparency and facilitate public access to government data.

OpenML

OpenML is an open-source platform that allows users to share, discover, and analyze datasets and machine learning experiments. It provides a collaborative environment for researchers and practitioners to collaborate and contribute to the development of machine learning algorithms.

GitHub

Although GitHub is primarily a code hosting platform, it also serves as a repository for datasets. Many researchers and organizations share datasets on GitHub, making it a valuable resource for finding datasets across various domains. You can search for datasets using specific keywords or explore repositories dedicated to datasets.

Other platforms

Here are 30 lesser-known dataset-sharing platforms that you can explore:

  1. DataHub: https://datahub.io/
  2. Figshare: https://figshare.com/
  3. Quandl: https://www.quandl.com/
  4. Zillow Prize: https://www.kaggle.com/c/zillow-prize-1
  5. Data.world: https://data.world/
  6. OpenSNP: https://opensnp.org/
  7. Dataverse: https://dataverse.org/
  8. Datacite: https://www.datacite.org/
  9. Open Data Network: https://www.opendatanetwork.com/
  10. HDX: https://data.humdata.org/
  11. AWS Public Datasets: https://registry.opendata.aws/
  12. Social Science Data Repository (SSDR): https://data.nber.org/
  13. Open Energy Data: https://open-power-system-data.org/
  14. Open Neuro: https://openneuro.org/
  15. GeoNetwork: https://geonetwork-opensource.org/
  16. Zenodo: https://zenodo.org/
  17. Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
  18. Open Images: https://storage.googleapis.com/openimages/web/index.html
  19. PubMed: https://pubmed.ncbi.nlm.nih.gov/
  20. Earthdata: https://earthdata.nasa.gov/
  21. Humanitarian Data Exchange (HDX): https://data.humdata.org/
  22. Registry of Open Data on AWS: https://registry.opendata.aws/
  23. European Data Portal: https://www.europeandataportal.eu/
  24. Global Database of Events, Language, and Tone (GDELT): https://www.gdeltproject.org/
  25. OpenMLCC: https://openml.github.io/openmlcc/
  26. Data.gov.uk: https://data.gov.uk/
  27. National Centers for Environmental Information (NCEI): https://www.ncei.noaa.gov/
  28. DataONE: https://www.dataone.org/
  29. International Monetary Fund (IMF) Data: https://www.imf.org/en/data
  30. Open Data Soft: https://www.opendatasoft.com/

Any comments or suggestions? Let me know.

To cite this article:

@article{Saf2022Discovering,
    author  = {Krystian Safjan},
    title   = {Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms},
    journal = {Krystian's Safjan Blog},
    year    = {2022},
}