Collecting data creates entities but there is no set way of doing this or “core” dataset due to digital complexity. Instead, we need an overlapping analytical tool. To do so, I’ll explain how Ludwig Wittgenstein’s notion of family resemblance can be applied to data collection and then argue how this creates entities for case study researchers. Wittgenstein (1986) develops the notion of family resemblances. Family resemblance is a set of analogies for describing the fuzzy borders around categories. Wittgenstein observes that language games are often unaligned and subject to change. Wittgenstein uses the notion of a game to understand that these types of activities have rules and goals but they are different and may only have vague related elements. In Wittgenstein’s original formulation, these shared similarities can be understood as family resemblances.
I can think of no better expression to characterize these similarities than “family resemblances”; for the various resemblances between members of a family: build, features, colour of eyes, gait, temperament, etc. etc. overlap and criss-cross in the same way.— And I shall say: ‘games’ form a family. And for instance the kinds of number form a family in the same way. Why do we call something a “number”? Well, perhaps because it has a—direct—relationship with several things that have hitherto been called number; and this can be said to give it an indirect relationship to other things we call the same name. And we extend our concept of number as in spinning a thread we twist fibre on fibre. And the strength of the thread does not reside in the fact that some one fibre runs through its whole length, but in the overlapping of many fibres. (Wittgenstein, 1986, p. 32 [PI 67])
For Wittgenstein, a thing or concept may have no unified core or Platonic form. Rather they possess shared fibers. Bain et al. (2022) clarify Wittgentsein’s meaning:
[Family resemblance] is a tool of analogies for describing the fuzzy borders around categories. Wittgenstein observes that language games are often unaligned and subject to change. When comparing games, certain elements drop out, and new ones arise arbitrarily. However, there is seemingly no underpinning attribute across them all. Words and games are too unrelated and multifaceted to support a logical or clear-cut definition. Instead, games’ and words’ meanings have likeness, which Wittgenstein refers to as family resemblance. In other words, there are degrees of belonging to a category where elements in each set share some common attributes.
As an analytical tool, family resemblance is useful for thinking about the data collection process when creating entities for case studies. There is no core to any entity but instead a set of fibers allowing researchers to build and develop (“weave”) a case study from the available data. Because cases are selections of reality with boundaries, as well as attempts to represent multiple realities, the analogy of family resemblances and metaphor of fibers aptly describes what researchers are doing. In doing so, the entity as a series of relations has no firm core but is shot through with various assemblages.
Family resemblance equips case study researchers with three analytical techniques for data collection. First, it dispenses with templates for case studies. There is no set way to conduct a case study but rather a set of practices with overlapping similarities. Data is not collected to fit a specific template but is instead gathered and then analyzed from a stance of epistemological openness. This inductive approach is more flexible and can allow for greater creativity in the research process. Second, family resemblance frames data collection from a planned event to significantly recursive and iterative sets of processes. Third, family resemblance offers the idea of entanglement. Different elements within the case study are connected and cannot be understood independently from one another. This is different from reductionism, which would seek to understand the case study by breaking it down into its component parts. Instead, understanding the case study as an entangled whole can give researchers a more holistic understanding of the phenomenon under study. Data collection is thus less about atomistic elements and more about collecting data in multiple, iterative ways that aim at holistic description and understanding. (In terms of analyses, family resemblance can be used to understand how different case studies can be related to one another. By understanding the shared practices and similarities between different case studies, researchers can develop a more comprehensive understanding of the phenomenon under investigation. I will discuss this point in the next chapter on data analysis.)
Data collection relies on constructs of what the data, as evidence, is. Researchers connect their data through a sequence of evidence, thereby creating and constructing their cases. This sequence is a representation of evidence legitimated by the data collected and through its synthesis on the part of the researcher(s). Representation and legitimation undergird data collection’s ideology. By way of Denzin (1994), Duff (2008) writes, “Representation refers to how we represent or position our participants, data, and interpretations, and also, perhaps indirectly, how we position ourselves as researchers in relation to those studied. Legitimation is the basis for the warrants or claims we make about our data and the authority of our reports” (p. 109).
Family resemblance, as an analytical tool, provides a way for representation and legitimation to avoid essentialism and determinism. Researchers do not need to represent a core or some universal trait in their case studies because this would rely on some type of neo-Platonic form. As an example, forms of automation, such as through algorithms, may reify data making it appear as though it has a Platonic core or template. An entity, as a term, avoids this theoretical leaning by emphasizing relationality. Family resemblance stresses a similar approach but with respect to data collection. There needs to be no single piece of evidence, or even type of evidence, that a case study researcher must collect to describe the case study. Rather, family resemblances can occur between the various data collected to produce the case study. There is no essential component or determining piece of evidence that “finally” produces the case study.
Family resemblance when perceived in this way has three important implications for digital research and data collection. First, digital research often encounters data that is rapidly changing, both in terms of technicalities (the data type, for example) and appearance (formatting of a website as another example). Case study researchers don’t need to collect one type of data or even need a consistent type or format of data. This point of view is, of course, very practical but it also recognizes the constructivist nature of all research but especially digital research. Data collected is a product of digital research and is not natural, ahistorical, or permanent. This understanding of digital phenomena can help to avoid reification, or the treating of phenomena as if they are static and unchanging. Second, due to the sheer scale and infinite distribution and circulation of digital phenomena, it is possible to build a case study from any number of data points. No evidence is indisputable or incontrovertible. Family resemblance provides for a more dynamic and relational understanding of digital phenomena. Things are not represented as isolated or autonomous, but as part of a larger web of relations. Third, family resemblance may assist researchers to challenge essentialist assumptions about data, including racist, gendered, ableist, and homophobic assumptions, because it serves to challenge any essentialist attitudes toward datasets.