Elisabeth Günther / Damian Trilling

But How Do We Store It? (Big) Data Architecture in the Social-Scientific Research Process

[PDF herunterladen]

The social-scientific research process is usually considered to consist of reviewing literature and theory, followed by the generation of research questions or hypotheses, the collection of data, their analysis, and writing up the findings. In this chapter, we argue that in the age of Big Data social scientists have to increasingly consider a step that can be located between the collection and analysis of data: the storage of the data. Based on the notion of data architecture, we discuss how the choices made at this stage impact the ways the data can be used and the research questions that can be answered. In particular, we compare file dumps, relational databases, document stores, and graph databases. We develop a scheme to make a choice for one of these approaches based on five criteria: the need for preprocessing, the properties of the data, the research design, the available infrastructure, and the available expertise. We conclude by summarizing their strengths and weaknesses along two dimensions: ease-of-storage versus reliability-of-retrieval and ease-of-use versus power-to-explore.