Saturday, November 30, 2013

What is data? A few reflections

I recently finished reading “The Immortal Life of Henrietta Lacks” by Rebecca Skloot.  It tells the stories of the woman behind the HeLa cell line.  The cells coming from Lacks’ cervical cancer take on a life of their own after her death.  Henrietta, the woman, dies, and HeLa, the cell-line is born.

As Skloot tracks down Henrietta’s family and tries to uncover the personal stories of the past and present, one feels that the human side of Henrietta is given some life again.  A poignant scene occurs when Skloot accompanies Henrietta’s daughter Deborah and son Zakariyya to a cell biology laboratory at Johns Hopkins, and is given a glimpse of her mother’s still living cells:  (p. 265-266)

Deborah and Zakariyya stared at the screen like they’d gone into a trance, mouths open, cheeks sagging.  It was the closest they’d come to seeing their mother alive since they were babies.

After a long silence, Zakariyya spoke.

"If those our mother’s cells," he said, "how come they ain’t black even though she was black?" "Under a microscope, cells don’t have a color," Christoph told him. "They all look the same—they’re just clear until we put color on them with a dye.  You can’t tell what color a person is from their cells."  He motioned for Zakariyya to come closer.  "Would you like to look at them through the microscope?  They look better there."

Christoph taught Deborah and Zakariyya how to use the microscope, saying, “Look through like this…take your glasses off…now turn this knob to focus.”  Finally the cells popped into view for Deborah.  And through that microscope, for that moment, all she could see was an ocean of her mother’s cells, stained an ethereal fluorescent green.

"They’re beautiful,” she whispered, then went back to staring at the slide in silence.  Eventually, without looking away from the cells, she said, “God, I never though I’d see my mother under a microscope— I never dreamed this day would ever come.”
This book got me thinking about the question of the relationship between science and people.   What does it mean to look at something in a scientific way?  Is it necessarily dehumanizing?  I think an important element in looking at this question surrounds the topic of “data”.  What is data?

I’m used to thinking of data in the context of my life as a scientist.  Data is the result of a measurement.  Data are attached to well defined scientific constructs such as temperature, pressure, lengths, times, densities, etc.  In the narrow context of my own field of electron beam dynamics, a measurement typically involves the beam current, its time dependence, the size and shape of this electron bunch distribution, the spectrum of x-rays emitted, etc.  Using measuring devices, we determine the values of these different quantities.  The results are considered data, sometimes good, sometimes not so good.

I find the question of what is data to be relatively straight forward and uncontroversial in this context.  But when I consider data in a broader context it seems as though its meaning is both less clear, and more important to clarify.  Consider experiments done on the HeLa cells, on Henrietta’s cells.  One may think in a concrete scientific context about these cells.  How big are they? What is the genomic structure?  How frequently do they replicate? etc.  These questions may be thought to produce data.  But what sets the facts about the answers to these questions apart from the many other facts about the living or dead body of this woman, Henrietta Lacks?  Is it being couched in scientific terms that turns a fact into data?

I had a realization about this question yesterday as I was reading through some letters I have which date from 1929 and are written between my grandparents on my father’s side.  My grandfather, Acatius, or Akos, was born in Poklostelek, Hungary in 1908, and in 1929 was attending medical school in Tours, France.  My grandmother, Dorothy was born in New York in 1907, and in 1929 was also in Tours, France with her sister Violet studying French.  These letters give a window into the beginning of their relationship at this time.

I realized that because I know so little about the facts of my grandparents lives, that I was thinking about these letters in terms of data.  Each letter could provide some clue that would allow me to test hypotheses about who my grandparents were.  I was building an inner model, and I could check it for consistency as I read more letters.  Was my grandfather a thoughtful person?  Was he kind?  How did he see the world?  I realized that it is in the framework of asking these questions that I can view the letters my grandparents wrote to each other as data.   Without the associated imaginative task, the letters are not data.  I turned them into data by my asking specific questions about them.

This reminds me of this fascinating post on The Frailest Thing entitled “From Memory Scarcity to Memory Abundance” in which Sacasas reflects on the meaning of increased use of recording technologies and on Barthes’ veneration of a single photograph of his mother.  From a perspective of data, we can mainly look on the lack of more photographs as a tragedy.  There’s just not enough data for Barthes to build a proper model of his mother.  But once stated, we realize the potential absurdity of this.  Barthes knew his mother.  He was not trying to uncover something unknown in the process of viewing this photograph.  The relationship was not one of a scientist to data, but one of a son towards his mother.  

Reading Skloot’s narrative of Henrietta Lacks, and about how so much was learned about cell biology, we see that Lacks herself, and her surviving family gained very little directly from this work.  We might say that Lacks’ cells became data.  And they became data precisely because someone was asking certain kinds of questions about them.  In particular, people were trying to understand cell biology, and the HeLa cells provided much data for associated questions and modeling.  We might compare this to today’s so-called “big-data explosion”.   There is a sense in which there is newly created data due to people’s increased use of digital communications which may be perhaps quantified more easily than analog communications.  But I don’t think its the quantification that makes it data.  Yes, we talk in general about data on a hard drive.  But we might also talk about files on a hard drive, or images on a hard drive, or to go in the other direction, we could talk about magnetic domains and regions of varying polarity.  It is only if someone asks questions and tries to build a model that we might consider the scattered computer files, and records of typed comments on social media, and voice recordings through skype and cell phones to be data.

This perspective then allows some push-back in the privacy debate when we are told that we create “data traces” whatever we do.  We can respond by asking about the sense in which the traces are really data.  What are questions being asked?  What are the interpretive models being built?  And from the history of scientific experimentation, we can understand that there ought to be some limitations and framework in place regarding the transformation of elements of our lives into data.