Below is a list most interesting data sources I’ve come across:
Big Data: 33 Brilliant And Free Data Sources Anyone Can Use
Awesome Public Datasets – GitHub
Publicly Accessible APIs:
- tumblr
- Wikipedia
NCES, UCI Machine Learning Repository
University websites (Berkeley has lots of data)
Difficult sources of data, mostly because of restrictive APIs / anti-scraping policies:
- Yelp
- Foursquare
- Craigslist