Vital Statistics — The Texas Scientist

By Marc Airhart. Illustrations by Jenna Luecke.

From baseball to financial investing, from elections to oil drilling, analyzing data quickly to predict future outcomes is transforming industries and activities around the world. Take, for example, car-racing. Each time a Formula 1 driver lunges into first place, part of the credit goes to a crew not of mechanics but mathematicians, who continuously monitor and crunch data, on everything from air pressure across the hood to brake temperatures, using that data for sophisticated models to forecast how things will unfold and to inform a driver’s moment-by-moment strategy.

Now imagine applying similar data-monitoring and forecasting to our physiology and health instead of to cars. Experts say joining the mathematical and computational revolution has the potential to transform healthcare, one of our nation’s largest and most critical industries.

“This idea of real-time monitoring and making decisions on the basis of models, it’s really changed the way engineering works and large sectors of the economy,” said James Scott, an associate professor in the Department of Statistics and Data Sciences (SDS), “but it hasn’t changed healthcare yet.”

Scott and other researchers at UT Austin are leading the way in applying mathematical tools to make the next big breakthroughs in medicine and healthcare, with data-enriched imaging techniques, clinical drug trials built on sophisticated formulas and new statistical tools that help doctors make better decisions.

Targeted Drug Trials

In one ambitious project, researchers are applying data science to predict trajectories for cancer patients. Right now, so-called targeted therapies – those designed to go after specific gene mutations or proteins found only in a disease like cancer – are largely failing to live up to their promise as a more effective, less toxic alternative to existing therapies. One reason is that cancers are so complex that, for many patients, these therapies either work only temporarily or not at all.

But what if medical math could forecast how a tumor would grow in a particular patient’s body?

“We’re moving away from the paradigm of treating all patients the same,” said Peter Mueller, a professor in the Departments of Mathematics and Statistics and Data Sciences. “Each cancer is different.”

One reason targeted therapies fail is that tumors evolve. New genetic mutations can arise in some cells but not in others; over time, the cancer becomes a soup of many different subpopulations, called subclones. Mueller is developing mathematical models to help explain how tumors evolve into this heterogeneous mix. Using genome-sequencing data from tumor biopsies, he applies methods from Bayesian statistics to cluster and analyze these subclones: identifying groupings, characteristics, and particular genetic mutations. It’s the ultimate in “know thy enemy.”

One of the ways Mueller is most excited to see the approach applied is in a clinical drug trial he’s helping to design in which analyses of subclones in a patient’s tumor will be used to aid in selecting the most effective cancer treatment for that individual. The researchers will then reanalyze and adjust the treatment as the mix within the tumor changes.

“We’ll combine more and more information and determine the optimal treatment for each patient,” Mueller said.

With enough clinical drug trials like this, cancer treatment could eventually become what’s known as precision medicine. In place of today’s trial-and-error methods, doctors would plug a patient’s demographics, health data, personal genetics and specific tumor genetics into a mathematical model and predict the treatment options with the best chance of working for any particular patient.

Making Tough Choices

Developing models to help doctors arrive at the best decisions for their specific patients, based on large data sets isn’t meant to replace doctors, Scott explained.

“It’s about empowering them,” he said, “with the information they need to make a decision wisely.”

For example, obstetricians have a hard time estimating the risk of two significant dangers for developing babies: stillbirth – meaning deaths in utero after the 20th week of pregnancy – and neonatal deaths, those in the first month after birth. Each year in the U.S., about 40,000 pregnancies end with one of these two devastating outcomes.

A dizzying range of factors in a mother’s health history help determine which risk is higher. What is a doctor to do when inducing delivery can prevent a stillbirth in one case and lead to a baby being born dangerously prematurely in another? Scott envisions an app that would plot, for each patient and her risk factors, one risk curve for stillbirth as a function of gestational age and another curve for neonatal death as a function of time since delivery.

“If the stillbirth curve goes above the neonatal death risk curve, in principle there’s a higher risk of leaving the baby in utero than of early delivery,” Scott said. “Based on that, a doctor might decide to induce delivery at 38 or 39 weeks.”

He’s already working with a team including collaborators at UT’s Dell Medical School to begin to draw these sophisticated curves as accurately as possible, based on millions of patient records from national vital statistics datasets.

David Paydarfar, chair of neurology at Dell Medical School, is one doctor working on a similar decision-making tool for physicians who treat preterm babies. It’s an innovation he said has the potential to complement a physician’s own experience and help correct possible biases: “It allows you to take the wisdom gained from thousands of infants, instead of the 10 or 20 you’ve personally seen that seem similar, and choose how best to treat an individual.”

Hidden in Plain Sight

Just as experts see the potential for mathematical precision to help correct for human biases and blind spots, similar methods may be able to address performance gaps in our current medical technologies. Chandrajit Bajaj, a professor of computer science, for example, is developing a new way to use mathematics and light to analyze biological tissue more effectively than today’s pathology tests allow.

For some types of cancer, cells are misidentified – as either malignant or benign – after a biopsy in up to 20 percent of cases. Bajaj’s method would remove much of the guesswork and enable earlier cancer detection by identifying the chemical make-up of individual cells, so doctors would have a much more sensitive readout for differentiating cancer cells from healthy cells in a biopsy.

Bajaj’s method, called chemical imaging, analyzes the colors of light reflected by human tissue under a microscope to create chemical profiles for each pixel in the image. Unlike human eyes that can see combinations of three primary colors, the sensor is able to detect thousands of colors, or frequencies, of light, including infrared frequencies that we can’t see. Satellites use something like chemical imaging to distinguish details like grass, rocks and concrete on Earth from vantage points in space or to identify different chemical elements produced by stars.

Bajaj built a mathematical model to simulate how a chemical imaging system would work, allowing him to tune the design virtually until it gathers the clearest, most accurate images – information that’s now allowing colleagues to build the ideal chemical imaging instrument itself.

Another mathematical model Bajaj is constructing translates the varying intensities of thousands of colors of light that the instrument collects into chemical components. This, the equivalent of distinguishing trees from rocks in satellite data, is known in mathematics as inverse un-mixing analysis. This will allow doctors to distinguish a wider range of cell types in a biopsy, which can mean earlier cancer detection and better characterization of cancer subtypes. It might even shed light on changes in the microenvironment around cells, which could potentially signal a shift towards dangerous metastasis.

“Just in the last few years, computer scientists and mathematicians have developed methods for analyzing large data sets and solving optimization problems quickly,” said Bajaj, “and this lets us do our high-level data analysis in real time.”