Dome, L. (last edited: 2019-11-03)

Now, whatever you are trying to do (improving your code, implementing a model, learning and practicing statistics), sooner or later, you will need some real-life datasets (not just a some random-generated dummy data to debug your code). I use R, which already has a lot of publicly available data (some of it will be in the list). Chances are, you are already using some data, which is shared through some platform and you looked for those data sets for hours or someone randomly mentioned it, maybe you stumbled across it while reading abstracts in Google Scholar, maybe you did an experiment to get those data. I thought the best way would be to make a non-exhaustive awesome-lists in the spirit of #openscience, so if the need arises, I could save some time not looking for these things scattered across the web. Most of it, if not all, is psychology- and behaviour-related. This list can help you in computational modelling, with your curriculum, coursework, statistics, and so on. Hopefully, the list will grow ine time and will inculde even more real- life dataset, because you don't want to waste time on using not-real data.

  • Open Source or Open Access, but pls check the licensing

  • R package


An R package with formal psychological models of categorization and learning, independently-replicated data sets, real-life data sets, against which to test them, and simulation archives.

Wills, A. J., Connell, G. O., Edmunds, C. E. R., & Inkster, A. B. (2016). Progress in modeling through distributed collaboration: Concepts, tools, and category-learning examples. Psychology of Learning and Motivation, 1–23.

Tools associated with General Recognition Theory (Townsend & Ashby, 1986), including Gaussian model fitting of 2x2 and more general designs, associated plotting and model comparison tools, and tests of marginal response invariance and report independence.

'The database contains data from articles published in MEDLINE-listed journals from experimentally induced altered states that were assessed with a specified set of standardized questionnaires' (Schmidt & Berkemeyer, 2018).\ \ Schmidt, T. T., & Berkemeyer, H. (2018). The Altered States Database: Psychometric Data of Altered States of Consciousness. Frontiers in Psychology, 9.

JOPD is a journal that publishes articles, which do not contain research, but descriptions of data sets and where to find them.

Awesome lists on github are incerdibly useful when you are looking for alternatives in anything. AwesomeData created a topic-centric list of high-quality open datasets in public domains. They list several Neuroscientific datasets (including openfMRI, Human Connectome Project, etc. ), they also do have a Psychology+Cognition section, but the link is dead as of 2018-06-23.

Open Source Psychometric Project is an amazing thing. The website offers a selection of psychological tests, though a large portion of it is filled with personality tests. The project also shares the (anonymous) data collected through the website - you can have your data at the end, just to be clear. The shared data is in both .txt and .csv format, so no programming environment will have any problem in importing it.

This R package is mainly aimed at sharing the data sets discussed and analysed in Analyzing Linguistic Data: A practical introduction to statistics using R, Cambridge University Press, 2008. These are real world data sets relevant to psycholignuistics.

SAMHDA made the list in collaboration with ICPSR. This is in itself a short list, with link to the ICPSR pages The data sets are interesting, but the problem is that they are restricted-use data. There is more information on the site.

This website is a collection of databases that are available to use without subscription or institutional affiliation. There is a Psychology category in their list whithout any links to the data itself, but they have databases relevant to Addictions and Rehabilitation research, which is nice.

This is a data archive from the Race Implicit Association test (IAT), with more than two million participants. This is amazing and even though there are doubts about whether IAT is a real effect, the data set is amendable. My two biggest problem is that it is not the tidiest dataset and also not the raw data. Because it is not a raw data, the number of things you can do is limited and by proxy you are required to make a range of assumptions that cannot be verified by appropriate tests. Alos, it does not use open source software and file formats, which is a problem for cross-compatibility. Nonetheless, the mere volume of the data set is amazing!

If you are trying to import it into R, make sure you have enough memory, probably approaching infinity.