Automated hypothesis generation: an AI role in science

When I was getting my PhD in Ann Arbor during the 1980’s, just staying up to date with the relevant literature to my own thesis project was a constant challenge. There was a paper magazine back then called Current Contents (CC). CC contained just that: the tables of content for all of the relevant journals (in Life Sciences). It was a critical resource because there was no other way—even then—to keep tabs on the collective scientific output.

Keeping tabs was not just for general knowledge about the field. Or even about properly giving credit to others. Rather, it was critical to the hypothesis creation. Asking the right question (at the right time) is what determines scientific success in many cases. But you can’t ask the right question without understanding whether it’s been already asked. And really you can’t ask the right question without a full understanding of what the current state of scientific knowledge is.

At the time, it was the habit, in many high impact papers, to have the last figure in the paper be a cartoon schematic that represented the author’s view of where the field was—at the moment of the paper’s acceptance into the journal. In my field of molecular neuroscience, this often was a series of shapes and arrows representing key biomolecules and pathways. It was often amusing to go from one paper to the very next that a particular group put out and see that some of the arrows would mysteriously reverse directions from the cartoon in the previous paper. This was presumably because the paper’s results along with other results had changed the thinking of the author.

In any case, that cartoon figure was always a clue into what the next hypothesis to be tested would be for a particular research group. So in a sense, you could predict the trajectory of scientific inquiry from that cartoon figure at the end of a paper.

That was the 1980’s. Our scientific knowledge base has expanded exponentially since then. One of the current versions of Current Contents is called Faculty of 1000 (F-1000). It’s on-line of course. The idea is that leaders in the field curate the papers that you should read based on your profile. It’s a great idea I guess, although science being as competitive as it is, I have doubts that the elect would give up some brilliant and undiscovered insight of a paper to the unwashed, if it really might supercharge some scientific inquiry. However, as a scientist, you have many other choices. Google Scholar comes to mind—it’s both comprehensive and I’m pretty sure it uses AI extensively to tailor its results. So machine-driven instead of human-driven (as in the case of F-1000).

However, the cartoon figure at the end of papers has become pretty obsolete (although it does still make appearances). That’s because pretty much all of science—certainly life sciences—has become incredibly complex. In my field, you can’t make a cartoon big enough to represent all the relevant biomolecules and pathways and the arrows have become incredibly intertwined because of the multiplicity of feedback loops and cross-talk links.

So not only is it difficult to glean the next hypothesis for the clever reader (even when there is a cartoon). It’s impossible for the author to do the same.

This has pushed much of science from the paradigm of Popper to exploratory research. In such science, I might read the data stream from some set of sensors, correlate that data with some other external variable (like seasonality) and publish a correlation that is intriguing. Correlation of course is not causation—we all know that.

And yet, science has the tools to do excellent hypothesis-based research. In neuroscience, optogenetics methods allow us to turn on and off neural circuits to understand their effects upon behavior. In molecular biology, CRISPR does the same for genetic circuits and networks.

The problem is not executing the research. It’s the ability to ask the right question. For biology, generating a hypothesis that is parsimonious with all of the current knowledge in a scientific discipline is challenging for human scientific superstars and downright impossible for your typical graduate student coming up with a thesis project. I believe that the same is true for any area of science where the volume of knowledge and relevant data has expanded exponentially.

But all is not lost. I think this is a perfect domain for AI as it exists today. Keeping tabs of many disparate but relevant data points and then coming up with a next move? That’s how AI’s beat humans in chess right now. So… AI in collaboration with human scientists might be a very fruitful collaboration going forward. And it may yet save hypothesis-based research.