Patrick Hsu got noticed early as an academic superstar. He was taking classes and doing lab work at Stanford University when he was 14. He had his PhD from Harvard in biochemistry and biological engineering by the time he was 21.
It was cutting edge schooling too. Most of his college, masters and PhD work from 2010 to 2014 was in Feng Zhang’s lab at the Broad Institute of MIT and Harvard. Zhang, one of the pioneers in the development of CRISPR, was Hsu’s thesis advisor. Hsu’s Linkedin page lists Hsu as an author on ten papers during that period, three as first author (who did most of the work) and two as second author.
He said in an interview with author, cardiologist and genetics professor Eric Topol recently that he learned to code growing up the way others learned foreign languages. “I used to work through problem sets before dinner growing up. And so it was just something I learned natively like learning French or Mandarin.”
Now at only 31, he is an old hand in the exploding world of digital biology. And as the co-founder of the Arc Institute in Palo Alto, he has become one of its leading decision makers. Venture capitalists are pouring money into digital biology today. A year ago Xaira Therapeutics launched with $1 billion. Rounds in excess of $100 million for companies like Inceptive, which we wrote about in January, are no longer uncommon.
Arc has a simple mission: It wants to rethink and speed up how ideas go from concept to implementation in biology using computers, AI and other digital technologies. Hsu and his co-founder Silvana Konermann have raised more than $650 million from a who’s who of philanthropists in Silicon Valley since they launched in 2021, including Patrick Collison, co founder and CEO of Stripe.
Arc has gotten particular notice for its work developing a virtual laboratory known as Evo in partnership with NVIDIA, Stanford University, UC Berkeley, and UCSF. It just announced the second version of this software in February. Hsu says it’s the largest AI model ever trained in the domain of biology. He said it’s too soon to use it as a foolproof drug development or diagnostic tool. But he says they are getting close.
“It’s able to, for example, predict the pathogenic effects of breast cancer mutations. So we took the mutations from BRCA1, the famous gene and we showed that it's state of the art in classifying coding or non coding mutations in the BRCA1 or BRCA2 genes. I'm not saying folks should feed in their (genetic) variants into the model and then make a decision on their health. But I am hopeful that future versions of these models could have that level of utility.”
The Trump Administration’s assault on government and university funded academic research has made Arc’s mission more relevant than Hsu or anyone in his field could have imagined. I wanted to find out what Hsu thought of this as well as to learn more about Arc and the future of digital biology.
What follows is an edited Q&A:
Fred Vogelstein: What made you want to start the Arc Institute?
Patrick Hsu: I think a lot of what we wanted to do at Arc is to build a place where our investigators are investigating. We want them to have a place where they can work on problems, and where we at Arc can give them the long term focus and capital and infrastructure to be able to attack those ambitious goals. We want them to have the freedom to do unapologetically ambitious, high risk, high reward science.
The problem today is that our principal investigators - what we call lab heads in biology at research universities - are doing precious little investigating. When you have those jobs you spend as much if not more of your time fundraising, writing grants, giving talks, traveling, managing your lab, recruiting people, sitting on academic committees, teaching classes, doing service obligations to your departments, and maybe starting companies. You do that as much, if not more than the core parts of the job, which is thinking deeply about breakthrough problems and making discoveries.
None of these are new ideas. There's a very long and illustrious 120 year history of American biomedical research institutes, right. From The Rockefeller University in 1901 to The Salk Institute, and Scripps Research, to The Whitehead Institute, The Broad Institute and Cold Spring Harbor Laboratory and so on.
All we’ve done is mixed and matched different aspects of these models, but with a view towards what are the unique opportunities in the 2020s, and in the Bay Area today. How can we build at the interface of biology/machine learning and across universities like Stanford, Berkeley and UCSF? How can we build between basic science and the biotech industry? Or build between biology in an academic setting with the technology industry?
Hopefully we can be a convening center across all of these different elements - a place where you can do really interesting work at the interface of fundamental discovery and technology development.
One of the things that we think a lot about is this oxymoron where we undeniably have insane capabilities today in terms of biotechnology. We have things like CRISPR and the most powerful microscopes we've ever imagined to look inside living cells. Except the top five killers in America 30 years ago are the same top five today, except we've arguably added one called Covid.
And so the question we ask is how do we make sure that we're able to take this big data era of biology and these fundamental insights and actually turn them into real world products or medicines that can actually touch people's lives, not just stop at science papers.
FV: So you’re trying to change the incentive system of biomedical research?
We're trying to show that by understanding all of these different aspects of the research industry and by experimenting at the organizational level with different variables and their weights and their connectivity, whether we can build an effective way of doing science. It's very early on, to be sure. But we think we have some early examples that this model could be replicated.
FV: How were you able to evaluate and measure how slow research universities are and then demonstrate there is a better, faster way to do research?
PH: I'll give you a lateral example.
Our co-founders and I, along with (economist) Tyler Cowen in the early days of the pandemic, stood up a rapid grant funding effort called FAST Grants to fund Covid response research. And we funded 250 or so projects with about $50 million. We had a Google Forms application that took 30 minutes to fill out. And we would fund or give you a funding decision within 48 hours.
We learned two interesting things from that. The first was we were able to fund a lot of important science. That included things like the first FDA authorized saliva test for Covid developed at Yale University, and the clinical trials for fluvoxamine (Luvox), the antidepressant. It can decrease Covid inflammation. And it was broadly prescribed in places where Paxlovid was unavailable.
So it was not just basic science from professors who were twiddling their thumbs around at home due to shelter in place, but also medical science that was actually really critical at a time of need. And the speed and effectiveness at which we were able to do this didn’t decrease the quality of the granting. If anything, it made it go up.
And we were able to show this at a certain critical mass of capital scale that caught the attention of many funding bodies.
FV: Is that what you used to get ARC off the ground in 2021?
PH: We were working on ARC before FAST grants. But I think one of the things that really inspired us was the degree to which government funding bodies like ARIA in the UK or the NSF or the NIH or DARPA or ARPA-H, or OSTP or other organizations paid attention to this.
Of course, at ARC even if we spend a billion dollars that’s a drop in the bucket relative to the $45 billion to $60 billion - however you count it NIH budget.
FV: Based on what's happening in Washington D.C right now the NIH budget might end up being a fraction of that.
PH: We'll see how that all shakes out. It hopefully goes without saying that I'm a product of government funded labs and NIH funded research. I am a deep supporter of basic science and a believer that it needs to happen. We need more of it. But we also need to update and optimize our structures by which we do this work.
For example, there's an entire middle space of projects (that don’t get funded) that require deep infrastructure, interdisciplinary work, professional staff and/or capital equipment across different areas that you can't bring together. And at ARC we're building in neuroscience and in immunology, in machine learning, in genomics and in chemical biology.
An individual lab might be really good at how many of those things? Maybe two. But very rarely can you bring all five. In short, it's how do you bring together the capital, the talents, the resources and the income infrastructure that you need to work across not one, not two, but three or four or five different fields.
FV: Can you talk more about what is going on in Washington? What are the pros and cons - if there are any pros? Are you seeing any downstream impacts yet on universities and research in general?
PH: The uncertainty is certainly very stressful. The systems we have have evolved and grown and sprawled over time. Researchers’ salaries should be, in principle, budgeted via direct cost functions. So in principle we shouldn’t have to lay people off if indirect costs come down.
But of course, it is not so simple. Because indirect costs pay for the actual labs, the gas, the electricity and the heating. As someone who runs an independent lab, these things are pretty damn expensive.
Yet I also think there has been a lot of discussion (for a long time) around the opacity of these indirect costs. About the challenges of auditing them and understanding them. And so it ends up being something like that old yarn about democracy. “It’s the worst form of government, except for all the others.”
The silver lining in all of these challenges is it forces us to ask whether we can use this pressure to try to design a system that is more geared for modernity. And that's the hope. That's like my fervent and genuine hope for our industry and community. But it’s clear that we will continue to have near term challenges and chaos.
FV: How does all this affect Arc, since you aren’t funded with public funds?
PH: Well, we're partnered with three flagship research universities in the Bay Area. And so we feel all of these reverberations. We fund our partner universities with our Innovation Investigator grants, our Ignite Awards, and multiple other programs. So we are doing our best within the capacity that we have.
FV: As an organization that has the capacity to raise maybe billions in non-government funds, does all this put you in a position to play a bigger role and fill some of the hole that is being created by all the changes at the federal level?
PH: Here's what I'll say: We have very specific and explicit research goals at Arc. We have these two flagship research initiatives. One is focused on Alzheimer's disease. The other is focused on simulating human biology with these AI foundation models (Evo), which we call our virtual cell initiative. We’ve gotten really far in biology via guess and check. A virtual cell model that can understand and model dynamic cell states could be used for accelerating biological experiments.
Right now those experiments have to happen in the real world with real cells and tissues and animals. That can take months to years compared to something that can happen much faster in the neural network.
And if that works it could significantly increase the efficiency of how we do molecular and genetically grounded science.
Our hope is that we can improve the efficiency of how we do research and also improve the leverage of individual researchers who today have to do lots of extremely manual and painstaking work.
The thing that gives me optimism is that these capabilities are becoming so much more powerful that we can hopefully ride a technology curve to improve the efficiency of the system.
FV: It feels like you are saying that biology, biotech, AI and Silicon Valley tech have been intersecting in all these interesting ways over the course of the past decade. But nobody's actually sat down and systematically put together a center that is designed to take advantage of those intersections in a systematic way. Am I overstating that?
PH: Well, everyone is trying to do interdisciplinary work, to have lots of infrastructure and to be able to do ambitious science. What we're trying to do in our model is to actually execute on big scientific projects that we think couldn't be done just at a university or just inside of a biotech startup or a pharma company.
FV: Can you give me some examples of how you’ve made this work?
PH: One thing is the biological foundation model work that we've been doing at Arc.
We've reported two versions of these DNA foundation models known as Evo.
We trained Evo 1, which was published last year on the cover of Science. Now we have a preprint of the follow up, Evo 2, which is the largest AI model ever trained in the domain of biology. By feeding in long context genomes into these models, the model can learn about the DNA, RNA and the proteins that are all embedded within genomes. That’s the fundamental information layer of all Life, right?
FV: Where is Evo in terms of something that a drug company, for example, could use?
PH: It depends on what you want to do with it. Right. But you know, we have released the model as an open source model. But it’s able to, for example, predict the pathogenic effects of breast cancer mutations. So we took the mutations from BRCA1, the famous gene and we showed that it's state of the art in classifying coding or non coding mutations in the BRCA1 or BRCA2 genes.
FV - Are you saying that because cancer is a genetic mutation that it can predict breast cancer?
PH: I'm not saying folks should feed in their (genetic) variants into the model and then make a decision on their health. But I am hopeful that future versions of these models could have that level of utility.
FV: Let’s talk about the future of AI generally. What’s the future of large language models in biology? Do all of them have to be large or can some of them be small, more focused models?
PH: I think we’re just in the very early days of understanding all of this across the diversity of different architectures - to figure out which architectures will be more task or domain specific. There's lots of work not just with the auto regressive transformer. But also lots of work on diffusion models, discrete diffusion models, multi convolutional hybrid architectures and probabilistic programming architectures that could be more open box and provide more statistical guarantee.
Maybe a broader way to put this is like this: Every decade or so in machine learning something fundamentally new comes along. The attention transformer paper from Google came out in 2017. But it’s been eight years. So maybe we're due for another architecture soon.
FV: Thank you for spending so much time.
PH: It was a joy. Thank you.