The importance of Big Data….

Michael White’s excellent piece is here. His view is that it’s different from the Popperian hypothesis-based science we are all used to and I tend to agree. I also worry that the term ‘Big Data’ is in serious danger of being over-sold. That happened once upon a time to another hot new discipline: Artificial Intelligence… and the results were not pretty.

NYT on Big Data

In today’s Review section, here.

Shout out to Mason’s own Rebecca Goldin in the piece:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”
Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.”

Encouraging collaboration…

At our Institute, a major “price of admission” for new faculty is a willingness to collaborate across disciplinary boundaries–the notion being that the loci for many major advances lie at the boundaries of disperate fields. This in itself is challenging because different disciplines operate with different technical languages, commonly called “jargon”. Finding a lingua franca between different disciplines takes time and energy and the pay off, while potentially large, is always fraught with risk (true scientific research is always risky).

Hence, here at Krasnow, the challenge is to encourage such collaboration across disciplinary boundaries, but the even deeper challenge is to encourage collaborations in general. Why?

A major reason is that our current training in science, especially at the doctoral level, emphasizes a solitary rather than team approach. The PhD thesis is, after all, a singularly individual intellectual product–the doctoral advisor’s name doesn’t go on the title page as an author for a reason. While the acquisition of data used in a dissertation may in some cases involve a team approach (think big data physics), at the data analysis level, for the thesis, the work is generally that of the graduate student.

Another reason for the challenge in getting scientists to collaborate is the inherent difficulties, under current systems of sharing data. Until data sharing curation and provenance norms are universal, the “safe” approach is to keep one’s own experimental data under wraps. While large scale data sharing is a desirable end-point, we still aren’t there yet.

Finally, my own sense is that a key ingredient of scientific success involves the ability to think intensely, without distraction, about a problem–and most individuals find it easiest to do this alone. Even if this isn’t the case, the conventional wisdom is that the “ah ha” moment follows such a period of introspective pondering.

So those are some reasons….how might one still encourage collaborations?