

持续更新中 机器智能研究MIR 2022-12-11

Complex Networks


1) AMiner Citation Network Dataset

2) CrossRef DOI URLs

3) DIMACS Road Networks Collection

4) NBER Patent Citations

5) NIST complex networks data collection

6) Network Repository with Interactive Exploratory Analysis Tools

7) Protein-protein interaction network

8) PyPI and Maven Dependency Network

9) Scopus Citation Database

10) Small Network Data

11) Stanford GraphBase

12) Stanford Large Network Dataset Collection

13) The Laboratory for Web Algorithmics (UNIMI)

14) UCI Network Data Repository

15) UFL sparse matrix collection

Computer Networks


1) 3.5B Web Pages from CommonCrawl 2012

2) 53.5B Web clicks of 100K users in Indiana Univ.

3) CAIDA Internet Datasets

4) CRAWDAD Wireless datasets from Dartmouth Univ.

5) ClueWeb09 - 1B web pages

6) ClueWeb12 - 733M web pages

7) CommonCrawl Web Data over 7 years

8) Criteo click-through data

9) Internet-Wide Scan Data Repository

10) OONI: Open Observatory of Network Interference - Internet censorship data

11) Open Mobile Data by MobiPerf

12) The Peer-to-Peer Trace Archive - Real-world measurements play a key role [...]

13) Rapid7 Sonar Internet Scans

14) UCSD Network Telescope, IPv4 /8 net

Data Challenges


1) Bruteforce Database

2) Challenges in Machine Learning

3) CrowdANALYTIX dataX

4) DrivenData Competitions for Social Good

5) ICWSM Data Challenge (since 2009)

6) KDD Cup by Tencent 2012

7) Kaggle Competition Data

8) Localytics Data Visualization Challenge

9) Netflix Prize

10) Space Apps Challenge

11) Telecom Italia Big Data Challenge

12) TravisTorrent Dataset - MSR'2017 Mining Challenge

13) TunedIT - Data mining & machine learning data sets, algorithms, challenges

14) Yelp Dataset Challenge

Image Processing


1) 10k US Adult Faces Database

2) 2GB of Photos of Cats

3) Adience Unfiltered faces for gender and age classification

4) Affective Image Classification

5) Animals with attributes

6) CADDY Underwater Stereo-Vision Dataset of divers' hand gestures - [...]

7) Caltech Pedestrian Detection Benchmark

8) Chars74K dataset - Character Recognition in Natural Images (both English [...]

9) Danbooru Tagged Anime Illustration Dataset - A large-scale anime image [...]

10) Face Recognition Benchmark

11) Flickr: 32 Class Brand Logos

12) GDXray - X-ray images for X-ray testing and Computer Vision

13) HumanEva Dataset - The HumanEva-I dataset contains 7 calibrated video [...]

14) ImageNet (in WordNet hierarchy)

15) Indoor Scene Recognition

16) International Affective Picture System, UFL

17) KITTI Vision Benchmark Suite

18) Labeled Information Library of Alexandria - Biology and Conservation - [...]

19) MNIST database of handwritten digits, near 1 million examples

20) Massive Visual Memory Stimuli, MIT

21) Open Images From Google - Pictures with segmentation masks for 2.8 [...]

22) Stanford Dogs Dataset

23) The Action Similarity Labeling (ASLAN) Challenge

24) The Oxford-IIIT Pet Dataset

25) Violent-Flows - Crowd Violence / Non-violence Database and benchmark

26) Visual genome

27) YouTube Faces Database

Machine Learning


1) All-Age-Faces Dataset - Contains 13'322 Asian face images distributed [...]

2) Context-aware data sets from five domains

3) Delve Datasets for classification and regression

4) Discogs Monthly Data

5) IMDb Database

6) Keel Repository for classification, regression and time series

7) Labeled Faces in the Wild (LFW)

8) Lending Club Loan Data

9) Million Song Dataset

10) More Song Datasets

11) MovieLens Data Sets

12) New Yorker caption contest ratings

13) RDataMining - "R and Data Mining" ebook data

14) Registered Meteorites on Earth

15) Restaurants Health Score Data in San Francisco

16) UCI Machine Learning Repository

17) Yahoo! Ratings and Classification Data

18) YouTube-BoundingBoxes

19) Youtube 8m

20) eBay Online Auctions (2012)

Natural Language


1) Automatic Keyphrase Extraction

2) Blizzard Challenge Speech - The speech + text data comes from [...]

3) Blogger Corpus

4) CLiPS Stylometry Investigation Corpus

5) ClueWeb09 FACC

6) ClueWeb12 FACC

7) DBpedia - 4.58M things with 583M facts

8) Flickr Personal Taxonomies

9) Freebase of people, places, and things

10) German Political Speeches Corpus - Collection of political speeches from [...]

11) Google Books Ngrams (2.2TB)

12) Google MC-AFP - Generated based on the public available Gigaword dataset [...]

13) Google Web 5gram (1TB, 2006)

14) Gutenberg eBooks List

15) Hansards text chunks of Canadian Parliament

16) LJ Speech - Speech dataset consisting of 13,100 short audio clips of a [...]

17) Microsoft MAchine Reading COmprehension Dataset (or MS MARCO)

18) Machine Comprehension Test (MCTest) of text from Microsoft Research

19) Machine Translation of European languages

20) Making Sense of Microposts 2016 - Named Entity rEcognition and Linking

21) Multi-Domain Sentiment Dataset (version 2.0)

22) Noisy speech database for training speech enhancement algorithms and TTS [...]

23) Open Multilingual Wordnet

24) POS/NER/Chunk annotated data

25) Personae Corpus

26) SMS Spam Collection in English

27) SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles)

28) Stanford Question Answering Dataset (SQuAD)

29) USENET postings corpus of 2005~2011

30) Universal Dependencies

31) Webhose - News/Blogs in multiple languages

32) Wikidata - Wikipedia databases

33) Wikipedia Links data - 40 Million Entities in Context

34) WordNet databases and tools

35) WorldTree Corpus of Explanation Graphs for Elementary Science Questions - [...]



1) Brain Catalogue

2) Brainomics

3) Collaborative Research in Computational Neuroscience (CRCNS)


5) Human Connectome Project


7) NIMH Data Archive

8) NeuroData

9) NeuroMorpho - NeuroMorpho.Org is a centrally curated inventory of [...]

10) Neuroelectro


12) OpenNEURO

13) OpenfMRI

14) Study Forrest

Social Networks


1) 72 hours #gamergate Twitter Scrape

2) Ancestry.com Forum Dataset over 10 years

3) CMU Enron Email of 150 users

4) Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape

5) EDRM Enron EMail of 151 users, hosted on S3

6) Facebook Data Scrape (2005)

7) Facebook Social Networks from LAW (since 2007)

8) Foursquare from UMN/Sarwat (2013)

9) GitHub Collaboration Archive

10) Google Scholar citation relations

11) High-Resolution Contact Networks from Wearable Sensors

12) Indie Map: social graph and crawl of top IndieWeb sites

13) Network Twitter Data

14) Reddit Comments

15) Skytrax' Air Travel Reviews Dataset

16) Social Twitter Data

17) SourceForge.net Research Data

18) Twitter Data for Online Reputation Management

19) Twitter Data for Sentiment Analysis

20) Twitter Graph of entire Twitter site

21) UNIMI/LAW Social Network Datasets

22) United States Congress Twitter Data - Daily datasets with tweets of 1100+ [...]

23) Yahoo! Graph and Social Data

24) Youtube Video Social Graph in 2007,2008



1) Airlines OD Data 1987-2008

2) Ford GoBike Data (formerly Bay Area Bike Share Data)

3) Bike Share Systems (BSS) collection

4) Dutch Traffic Information

5) GeoLife GPS Trajectory from Microsoft Research

6) German train system by Deutsche Bahn

7) Hubway Million Rides in MA

8) Montreal BIXI Bike Share

9) NYC Taxi Trip Data 2009-

10) NYC Taxi Trip Data 2013 (FOIA/FOILed)

11) NYC Uber trip data April 2014 to September 2014

12) Open Traffic collection

13) OpenFlights - airport, airline and route data

14) Philadelphia Bike Share Stations (JSON)

15) Plane Crash Database, since 1920

16) RITA Airline On-Time Performance data

17) RITA/BTS transport data collection (TranStat)

18) Toronto Bike Share Stations (JSON and GBFS files)

19) Transport for London (TFL)

20) Travel Tracker Survey (TTS) for Chicago

21) U.S. Bureau of Transportation Statistics (BTS)

22) U.S. Domestic Flights 1990 to 2009

23) U.S. Freight Analysis Framework since 2007
















