Survival Analysis

IT Reliability

Failure Modelling

Infrastructure Resilience

What Studying Dementia Can Teach Us About IT Reliability

Alan Holt

November 26 • 3 Min Read

How survival analysis links biology and technology.

In biology and in IT, systems fail, not instantly, but gradually, and often in ways we can predict long before the final breakdown. This connection became especially clear while working on a research paper I co-authored, The Effect of Mitochondrial DNA Half-Life on Deletion Mutation Proliferation in Long-Lived Cells. The study examined how post-mitotic neurons, cells that must survive for the entire human lifespan, succumb to mitochondrial dysfunction as damaged mitochondrial DNA accumulates over time. 

What struck me then, and continues to resonate now, is how closely this resembles reliability challenges in complex IT systems.

A Brief History of Survival Analysis

Survival analysis has its roots in a surprisingly old problem: understanding how long people live. Its earliest foundations trace back to the 17th century, with the use of “life tables” to estimate mortality rates. These tables were the original survival curves, tools first used for public health and life insurance, not medicine or engineering.

In the 20th century, as clinical trials became more rigorous, statisticians realised that many studies involved time-to-event data where some subjects hadn’t yet experienced the event (e.g., death, relapse). This led to key developments, namely:

  • Kaplan–Meier estimator – a method to estimate survival curves even when not all subjects have reached the event.

  • Cox proportional hazards model – allowed researchers to identify how different factors change the hazard rate without assuming a specific underlying distribution.

These innovations turned survival analysis into a general mathematical framework for any process where failure unfolds over time.

Survival Analysis in Biology: Neuron loss leading to Dementia in the elderly.

As we age, a mild slowdown in cognitive ability is expected and generally considered a normal part of getting older. Importantly, research shows that in healthy aging the actual loss of neurons is surprisingly small. This stands in sharp contrast to dementia and other neurodegenerative conditions, where significant neuron loss becomes a defining feature.

Age is the strongest known risk factor for these diseases, with the likelihood of developing dementia rising sharply after about age 65. While many influences shape when and how neurodegeneration begins, one major suspect is mitochondrial dysfunction. Over time, mitochondrial DNA can accumulate deletion mutations, and as these faulty genomes build up, they can impair energy production within neurons. Unlike most cells, neurons don’t divide, which means they can’t dilute age-related molecular damage. Faulty mitochondrial genomes can slowly accumulate until they cross a threshold where the cell can no longer meet its energy needs.

This gradual decline in mitochondrial function is believed to play a key role in triggering various neurodegenerative disorders. This is a textbook case for survival analysis:

  • The “event” is neuronal failure or dysfunction.

  • mtDNA mutational burden acts as a time-dependent hazard.

  • Even small molecular defects early in life can push the hazard curve upward decades later.

A key challenge highlighted by our work on mitochondrial uncoupling (The long term effects of uncoupling interventions as a therapy for dementia in humans) is that these therapies are only effective if applied long before clinical symptoms of dementia appear. By the time neurons begin to fail, the underlying mitochondrial damage has already crossed a critical threshold. This is precisely where survival analysis becomes invaluable: it helps identify how the hazard of neuronal failure evolves over decades and pinpoints the window in which intervention has the greatest impact. Instead of focusing on visible symptoms, survival modeling shifts attention to the silent, cumulative risks that precede them. In doing so, it reinforces the idea that uncoupling therapies must be administered proactively, during the long, pre-symptomatic phase when altering mitochondrial dynamics can still change the trajectory of neuronal survival.

In short, survival analysis lets us quantify something that isn’t obvious day-to-day: the hidden, cumulative risks that drive long-term failure.

From Neurons to Networks: The Same Maths, Different Domain

Once you recognise the underlying pattern of long-term deterioration, it becomes clear that the same mathematical principles used to model neuronal survival also apply across modern IT systems.

In technology environments, many processes follow a similar trajectory: systems operate continuously, experience gradual wear or degradation, and eventually reach a point where reliability drops sharply. Survival analysis captures these dynamics by modeling how risk evolves over time, rather than treating failure as a simple binary event.

This applies not only to physical hardware but also to digital processes and user behavior. Time-dependent patterns can be seen in how long components remain operational, how long software services stay healthy before requiring restarts or updates, and even how long users remain engaged with a platform before becoming inactive. In each case, the “event” might be failure, churn, deactivation, or transition to a different operating state.

Across these domains, the same tools—survival curves, hazard functions, and proportional-hazard models—provide insights into:

  • When failures or transitions are most likely to occur

  • How risk accelerates or decelerates over a lifespan

  • Which factors meaningfully influence longevity or stability

  • How early indicators of degradation can predict later breakdowns

Technical systems, through the lens of survival analysis, can help organisations better anticipate resource needs, plan maintenance or replacement cycles, identify abnormal patterns before they become outages, and design architectures that degrade more gracefully.

In this way, the mathematics used to understand the longevity of neurons finds a natural parallel in understanding the longevity and behavior of the systems we build.

The Big Idea

Whether we’re studying neurons or networks, one truth stands out: Failure is rarely random, it’s the product of accumulated, measurable risks. Survival analysis gives us a way to see beyond the moment and recognise how invisible trends evolve into future failures.

It’s fascinating to see biology and IT converge on the same mathematical tools. One is a living system evolved over billions of years; the other is human-made infrastructure only decades old. Yet both rely on long-term resilience, graceful degradation, and the ability to operate through continual stress.

Closing Thoughts

As our technological systems grow more distributed, autonomous, and long-lived, the parallels between biological and digital resilience become increasingly difficult to ignore. Neurons survive for decades by managing hidden risks that accumulate gradually, long before any obvious failure occurs. In the same way, modern IT infrastructure, whether physical components or cloud-native services, faces its own slow-burning hazards that shape long-term reliability. Survival analysis gives us a shared language for understanding these time-dependent vulnerabilities across domains. It reminds us that longevity isn’t just about what happens at the moment of failure, but about the entire history of stress, wear, and adaptation leading up to it. Studying how living systems navigate these challenges helps us to find better ways to predict, prepare for, and ultimately design around the aging processes in the systems we build.

Copyright NetMinded, a trading name of SeeThru Networks ©