Every new drug that goes into development is a multi year, multi billion dollar gamble. Those that make it to approval - through the five to ten year gauntlet of clinical trials and regulatory scrutiny - grant their creators a license to print money for nearly a decade. The problem is that only about one in ten make it that far. These sky high costs are partly why drugs are so expensive.
Dave Latshaw, a computational biochemistry PhD out of North Carolina State University thinks he can change all that using AI. He thinks it can squeeze a lot of the risk and inefficiency out of the system, making drugs cheaper and less of a gamble to develop.
He saw big pharma’s convoluted development process up close during half a decade at Johnson & Johnson. He was tasked with creating machine learning solutions for the company. That gave him broad exposure to the intricacies of J&J’s process. One success he had was to help J&J develop a better way to monitor its drug manufacturing and supply chain. That approach now saves J&J roughly $100 million a year, he said.
But he also knew that there was no way J&J could let him tinker with their process as much as he wanted. At J&J and throughout big pharma, the drug approval process remains hugely labor intensive, involving thousands of employees across many divisions. It’s extra resistant to change because it’s so heavily regulated.
So as the AI revolution gathered steam during the pandemic, he started BioPhy, a tiny Philadelphia-based AI software company that believes it can help big pharma streamline their drug evaluation processes and get approvals faster and cheaper.
Latshaw co-founded it with Steven Truong, a finance whiz he met getting his MBA at Wharton, and Daniel Sciubba, professor and chairman of neurosurgery at Northwell Health/Hofstra University. It’s only raised about $4.5 million. But business from about 10 giant pharma companies is good enough so far to support about a dozen employees.
Their goal is to get pharma companies to become less reliant on hunches and educated guesses at important inflection points of their approval processes, and more reliant on the data that already exists inside their operations. This data has historically been too time consuming to gather, process and analyze. But it’s trivial work for well designed AI models.
Armed with these new insights, BioPhy ultimately wants drug companies to start thinking about their portfolio of compounds in development the same way that investors think about a portfolio of stocks. What’s the risk/reward ratio of continuing to pursue approval for each compound at every step of the multiyear process?
It’s an undeniably important goal. But at first glance it also looks unattainable. How can anyone predict patient responses to new compounds?
The answer, Latshaw says, is that you don’t have to touch that part of the process to help any of these companies better predict their odds of success. That’s because “the process has two parts. It’s biochemical, and it's operational, “ he says. “There's some things that can’t be changed (like actual patient responses to a medicine in clinical trials). But there’s many things that can be. So we look at the things that can be changed to influence outcomes … like patient selection (for clinical trials), endpoint selection, (clinical) trial structure, where you're running the trials, who's involved, all those sorts of things.”
Latshaw said that once you’ve identified and weighted most of the variables impacting drug development and approval, you can rank a portfolio of compounds from most likely to succeed to least likely to succeed. From there, companies can make a more informed decision about how to allocate time and resources.
“If you look at the stock market, you'll see a few examples of (the problems we are trying to solve) like Cassava Sciences (which was working on Alzheimer's drug simufilam until it failed Phase III clinical trials in November),” Latshaw said. “There was so much wasted capital put into that asset (drug) which had clearly enough evidence to prove it was ineffective. Yet management pursued it. Sure there was fraud too. But even without the fraud, this kind of thing happens within large (established) companies as well - where they pursue (drug) programs (that are unlikely to succeed) and should instead reallocate that capital to programs that are more viable.”
The following interview has been edited for readability.
Fred Vogelstein: Start at the beginning. I know you were at Johnson & Johnson for six years until 2020 when you started BioPhy. Did BioPhy start at J&J? Or was this something that you did after you left?
Dave Latshaw: Everything that we're doing at BioPhy is separate from what I did at Johnson & Johnson. But the thesis, you could say, was derived from my time there.
I worked in a group that predated the Johnson data science organization. I was one of the people they brought in to figure out where to create machine learning solutions for the company. So my mandate was really, really broad. And I had the opportunity to learn the process of drug development firsthand, develop solutions and understand a lot of different pieces of the puzzle.
But I also saw a lot of things that could be done that I never really got a chance to address. So when I left to start BioPhy, it was really about trying to address some of those opportunities that I saw but hadn't gotten a chance to work on directly.
I had some wins and losses like everybody else (at J&J). But one of the things that I built is still an ongoing thing at J&J today (as part of their manufacturing and supply chain operation). And I think it saves them $100 million a year.
It took me two or three years to get off the ground. When you're doing manufacturing (at a big drug company) it's a very traditional operation. People typically look at things in a very one dimensional way. I saw that as very antiquated.
I started looking at it more from a more modern perspective … figuring out how to do more sophisticated process control and how to optimize those processes. I think the first win we had was that there was a batch of Remicade (an immunosuppressant). And we were able to save that from being completely discarded, which I think was a $1.5 million dollar savings.
Somebody took notice. We spun that into more funding. We expanded it out to Europe. And then we just kept growing it. The reason it ended up getting so big was because it not only helped internal manufacturing and supply chain, but external (drug manufacturing subcontractors).
FV: It sounds like you’re saying that you created advanced software to reduce the cost of materials associated with manufacturing drugs. Correct?
DL: That’s a good way to think about it. The metrics we tracked success against were increased yield, lower raw material usage, less batch failure and those sorts of things.
FV: Were they operating with something like a 1990s software infrastructure, and you modernized it? Or was it more complicated than that?
DL: Yeah, it was progressive, actually. The one thing you can't do with big pharma is jump right to today’s technology if they're 20 years behind. So what we did was we took a staged approach. You have to walk before you can run with those organizations.
FV: Also because they're heavily regulated.
DL: Exactly. You can't go make sweeping changes and expect the FDA to be okay with something. You have to be very intelligent about how the agency is in the loop.
FV: Did all that lead you to wonder about starting BioPhy?
DL: The role that I was in I got to see all of the other pieces that touched it. There's the regulatory piece. How Are you going to file these processes (with regulators)? How does that impact the filing for a new drug? How are you going to introduce (the agency) to new technology? How does this new material impact clinical trials?
I got to see all of these touch points that to me looked very underserved from a technology perspective. Any company that's been around for a long time like J&J has antiquated processes for some things. And I said, “Well, maybe these are the opportunity areas we should be looking at.”
Part of the thesis for some of the initial product - (what became BioLogicAI) - that we built around clinical trials was in collaboration with the people that I started the company with. So Steve Truong came from a finance background. And his perspective - outside in - was more about capital allocation. How do you choose the right opportunities to pursue?
I had the inside view of a pharmaceutical company. So together we were trying to think about how you could reduce the friction in the process.
So the logic there was if we could make better decisions earlier in the process based on whether these things were actually going to succeed in the future or not, then everyone would be in a better position because you'd get the right medicines to patients faster, and you'd be way more capital efficient within the company so you could pursue the right ideas.
FV: How does BioPhy’s technology help with all this?
DL: First it would be looking at the (customer’s) internal (drug) portfolio and trying to understand if the (drug development) programs being pursued were the right ones. If they are, it would ask, “How should the company be allocating resources to those programs on a risk adjusted basis just like you would with your own investment portfolio?” Once we’ve done that, we look at the individual clinical trials to maximize the chances that they're going to actually have the outcomes our customers are expecting.
FV: So it's clinical trial structure software, which historically hasn’t been done using a lot of computer technology?
DL: People are trying to optimize different pieces of that problem. I think there's maybe only a few players like us that are looking at it in a comprehensive way - the biological, the chemical operational, personnel, all that.
FV: At what point of the drug development process does your technology kick in? There's the whole process early on of just identifying drug targets, where most of the AI companies are playing. Are you there too or just later in the process?
DL: Ours is after where most people are playing right now. We come in post lead optimization, where you already sort of have an idea of the direction that you're going to go and go up through commercial approval. So our focus area is a little bit different than the vast majority of companies.
FV: What does the tech do, and why was it not possible earlier than now?
DL: What we're doing is predicting clinical outcomes. That can be useful for internal (drug) portfolio management and competitive intelligence. It can also help with (clinical) trial optimization. If you can predict those endpoints with a reasonable level of accuracy, then you can also say, “How should I structure this ( for example, a clinical trial) in a way that would increase my probability of success.”
We’ve also developed some additional technology (BioPhyRx) which frankly is probably one of the most popular things that we do right now, which is really around how do you effectively automate regulatory compliance.
FV: What does “automate regulatory compliance” mean?
DL: We need a better term for it because if the FDA thinks you're truly automating anything like that, then there's going to be a conversation.
Think about it like this: A large pharmaceutical company probably has 50,000 to 100,000 standard operating procedures they use to do their daily work. Not only are those (procedures) changing constantly, but the external regulatory environment is also constantly changing - old regulations being updated, new regulations coming out. Things are shifting all the time.
So how do you actually make sure that everything that you're doing is constantly lined up with external (regulatory) expectations so that in the event of any negative events, like audits, you're in constant compliance. It keeps you from having to worry about fines, or getting shut down or missing out on revenue because of those things. (One of the reasons that weight loss drug Ozempic/Wegovy was in such short supply a year ago was because FDA citations for cleanliness violations slowed production at one of Novo Nordisk’s subcontract manufacturers. )
FV: It sounds like you're saying that drug companies have entire divisions of mostly lawyers and specialists that do nothing but monitor and update these internal rules and external regulations and communicate those changes throughout the company. And what you do is automate that process. Correct?
DL: Yeah, that's exactly right.
FV: Is the appeal that this technology allows the entire regulatory compliance operation within a pharmaceutical company to be shrunk?
DL: Right now we've positioned it as an accelerant to the people who have to do that work - to take care of the day to day workflows of anybody who manages that space. The one thing I don't want to do is fearmonger about job replacement, or anything like that.
FV: So what makes all this possible now compared to five years ago?
DL: Quite a few things. First, scale and accessibility of compute. Second, the advent of more sophisticated natural language processing techniques. And third, the willingness of organizations to even entertain this. If you tried to have a conversation about doing this (with pharma companies or regulators) five years ago, you would have been laughed out of the room.
FV: What do you attribute that to? Is it just the popularity of ChatGPT starting two years ago?
DL: Yes. It absolutely is, as silly as that sounds.
The technology behind ChatGPT and transformers has existed for years. But ChatGPT popularized it. It forced everyone to understand AI because they knew if they didn’t then a competitor was going to. That fear of missing out or fear of being identified as somebody who overlooked an opportunity has been powerful.
FV: Do you have your own large language model that you've built yourself? Do you use ChatGPT’s? Someone else’s?
DL: We've actually created our own pharma specific language model. There's different schools of thought here on how to utilize these models effectively. Some people are only in favor of raising hundreds of millions of dollars and creating their own large foundation models. Some people want to do the wrapper thing (put their brand on a white labeled LLM). What we have found to be very effective is to create smaller domain and task specific models that are very good at specific jobs.
But it's not just language models that we use. For one engagement with a large pharmaceutical company, for example, we probably have to do about 18 billion comparisons in order to get the job done. If you try to use OpenAI for that, or you try to use some even moderately sized open source models, you're either not going to have the infrastructure to run it in a reasonable amount of time or you're going to be paying tens and tens of millions of dollars to do it. Neither of those are tenable for us. So we solved a lot of scalability issues using smaller, more job specific models that are not just language models, but hybrids.
FV: Did you build it from scratch? Or did you take one of the open source models out there and modify it?
DL: We started with some open source models.
FV: So like Meta’s Llama3 or something like that?
DL: Yep. There's practically a new one out every week. But we don't change the base models often. We've got a unique set of methods for creating the training data associated with doing these jobs. And that's one of the things that enables us to be successful. But then there's another piece of the process that we use that gets rid of a lot of the scalability issues.
FV: For the past 20 years we've lived in a world where bandwidth, processing power and storage have effectively been free with unlimited availability. Now we're in a world where, unless Nvidia is one of your partners, getting the compute time has become a huge gating factor to who succeeds and who doesn't succeed. It sounds like you guys saw that and took steps to work around it. What specifically did you do?
DL: I think we saw some of those issues coming. And we didn't want to go raise a crazy amount of money just to sit on a bunch of compute. We don't rely just on language models. We do a lot of traditional machine learning with the language models side by side. So we have sort of a hybrid architecture that takes care of it.
I can't go into it in too much detail in case somebody wants to go and try to compete with us. What I can say is that people usually think about doing everything in AI on a text basis, but that's actually pretty inefficient from a compute perspective. You have to think more about it numerically. What is the actual problem that you're trying to solve? How would you translate that problem into numbers as opposed to basing it purely on text?
FV: I thought that one of the things LLMs did incredibly well was work with numbers.
DL: You're correct. It's just that they do it in a very general way, which is what makes them very inefficient. The basis for how they work is by converting text to numbers. But if you think about any of them, they’re all trained to be generalist models. So they're very inefficient because they're supposed to solve many, many tasks. So how do you trim out the things that you don't care about so that the models become much more compute and scale efficient? That's the idea we're after.
FV: So you've modified the existing technology to be smaller and more specialized because you don't need it to have answers to a zillion questions. You need it to have answers to very specific things and not any other things.
DL: Yes, that's the best way to think about it. We're working with pharma. We don't need it to know about airlines. I think llama3b is probably one of the more compact text based language models that's out there. But even that's still a generalist model. So there are still way more parameters than you probably need to solve many, many problems.
FV: And the beauty of it being open source though, is that you can take it and modify it ?
DL: That's right. We could put in the time and effort to train it from scratch. And in some ways you could argue that's almost what we end up doing because we rip so much out. But it’s nice to have a place to start from.
FV: The consulting industry in Washington must not like you guys at all. People who work at the FDA typically leave to start lucrative consulting businesses telling drug companies how to navigate the agency
DL: Those contacts are still important. Because there's relationships that still help enable stuff to get approved. But that's only relevant after you get to the point where you're in compliance with all the rules.
FV: What's the long term plan here?
DL: The larger opportunity, I think, is to create a single platform that manages all drug development post discovery up through commercial approval. Think of it as an operating system for drug development. There's obviously linking this technology in with drug approval submissions. We're already involved in the clinical trial process. So putting that as a part of the overlay and then also integrating operations and supply chain along with the entire process.
And the idea here is to be able to link all of that information together. To get anything done in a pharmaceutical company it's always a multi person conversation with people from many different departments to get the job done. And there's no great way to facilitate that information sharing with a unified platform. So that's what we're after.