AI in Mortality Underwriting: Real Value and Marketing Claims

Almost every manager in the life settlement space now mentions artificial intelligence in their marketing materials. Most of the claims are vague: AI-enhanced underwriting, AI-driven mortality modeling, machine learning insights. Some of those claims describe real, useful applications of machine learning. Others describe spreadsheets with a new vocabulary.

This article tries to draw the line as honestly as we can. The intent is not to disparage anyone's methodology — we cannot evaluate other managers' systems from the outside — but to give an allocator the conceptual tools to ask the right questions and to recognize where AI is actually contributing value versus where the language has run ahead of the substance.

What a life expectancy report actually is

To understand where AI fits in this market, it helps to know what a traditional life expectancy report looks like. When an institutional buyer evaluates a life settlement policy, the central piece of underwriting input is a life expectancy (LE) report produced by a specialist medical underwriting firm. The LE report distills the insured's medical history, age, sex, smoking status, and other relevant factors into an estimate of the mortality distribution: not a single prediction of when the person will die, but a probability distribution of mortality outcomes derived from large-sample actuarial tables adjusted for the specific insured.

Definition: life expectancy (LE) report. A document produced by a specialist medical underwriting firm summarizing the mortality distribution for a specific insured based on their medical records and applicable actuarial tables. The LE is typically expressed as a median number of months or years, with the underlying mortality curve available for use by the buyer's pricing model.

The traditional LE process is human-intensive. Specialist clinicians and underwriters read the medical records, identify the conditions that drive mortality risk, apply rating debits and credits, and produce the final report. The process has been refined over decades and rests on a mature methodology grounded in actuarial science and clinical judgment. The Society of Actuaries publishes mortality tables and standards (most notably ASOP 48 for life settlement mortality work) that govern professional practice.

Where machine learning genuinely adds value

Against this background, there are several places where modern machine learning techniques add real and verifiable value.

Scale and consistency

Human underwriters are slow, expensive, and inconsistent. The same underwriter reading the same file at different times will sometimes produce different conclusions. Different underwriters reading the same file will sometimes produce different conclusions. This is not a criticism of the profession; it is a feature of any judgment-heavy task. Machine learning models are repeatable, fast, and consistent in the literal sense that the same input produces the same output. For buyers reviewing many policies, having a system that scores every policy on a consistent basis is a genuine improvement over the traditional alternative.

Pattern recognition across heterogeneous medical records

Medical records are notoriously heterogeneous. The same diagnosis can be expressed in different ways across providers; comorbidities are recorded inconsistently; medications are listed in different formats. Machine learning techniques — particularly those built around natural language processing — can extract structured information from unstructured records more reliably than ad hoc human processes. Once the information is structured, downstream modeling becomes more tractable. Peer-reviewed research published in journals including Communications Medicine and Scientific Reports has demonstrated that machine learning models trained on electronic medical record data can produce mortality predictions of useful accuracy, particularly for short-horizon (one-year) outcomes in hospitalized populations.

Discrepancy detection between independent assessments

One of the most practically useful applications of machine learning in this market is not as a replacement for traditional underwriting but as a second opinion. A buyer who runs a machine learning model alongside the LE reports it purchases gets a structured way to identify discrepancies: cases where the model and the LE report agree are easier to underwrite with confidence; cases where they disagree become candidates for additional clinical review. The combination is more powerful than either approach alone.

Portfolio-level analytics

Machine learning is well suited to portfolio-level questions that traditional LE reports do not address directly: concentration risk, correlated medical exposures across policies, the impact of premium projection assumptions on a book's expected duration, and similar questions. These applications are less about predicting individual mortality and more about understanding the structural properties of an entire portfolio.

Where machine learning does not, or does not yet, add value

There are also several places where AI claims tend to outrun the evidence. The honest version of the technology story acknowledges these limits.

Rare conditions and small subgroups

Machine learning models perform well when training data is plentiful and representative. They perform poorly when training data is sparse. Rare medical conditions — those that appear in only a small fraction of cases — produce mortality estimates with wide uncertainty bands. For these cases, the appropriate posture is to rely on specialist clinical judgment supplemented by model output, not the reverse. Marketing language that implies AI can outperform clinicians on rare conditions is generally not supported by the evidence.

Demographic gaps in training data

The peer-reviewed literature is clear that machine learning models trained on biased or unrepresentative data can produce biased outputs. In healthcare-adjacent applications, this is particularly important: training data may underrepresent certain demographic groups, certain conditions, or certain types of care. The result is that model performance is uneven across the population the model is being used on. Honest practice in this area involves continuous evaluation of model performance across subgroups, disclosure of limitations, and a willingness to override model output when the case at hand falls outside the conditions under which the model was validated.

Novel and emerging medical situations

Machine learning models are necessarily backward-looking. They are trained on past outcomes. When the medical landscape changes — new therapies, new diagnostic categories, changing standards of care — model performance on the new cases is not guaranteed to match performance on the historical cases. Oncology is the most obvious example: a person diagnosed with certain cancers today has very different mortality prospects from a person diagnosed with the same condition twenty years ago, because the treatments have changed dramatically. Models that have not been updated to reflect current outcomes will systematically misprice newer cases. Users of any AI model in this domain — whether they trained it themselves or rely on the model vendor — need a clear view of how current the training is and a discipline of additional human review for cases that are likely outside the model's effective coverage.

False confidence

Machine learning models often produce point estimates with apparent precision. The third decimal place on a predicted mortality score can communicate more certainty than the underlying methodology supports. A discipline of skepticism — about model output, about the data behind it, about the conditions under which the model was validated — is essential. A user who treats model output as authoritative because it came from "the AI" is often worse off than a user who relies on conventional methods with appropriate humility about uncertainty.

Model risk. The risk that a model produces systematically biased or incorrect outputs because of its design, training data, or use outside the conditions for which it was validated. Model risk is one of the most important and least discussed risks in any AI-enhanced underwriting framework.

Lessons from AI in other underwriting domains

It is useful to look briefly at how AI has played out in adjacent underwriting domains, because the patterns are informative.

In mortgage underwriting, machine learning models have become standard for credit risk scoring, fraud detection, and document processing. The combination of large datasets, well-defined outcomes (default or no default), and stable feature definitions has made the application productive. The honest critiques in mortgage underwriting concern fairness across demographic groups and the opacity of certain model decisions, rather than whether the models work at all.

In property and casualty insurance, machine learning is used widely for claims fraud detection, telematics-based driver risk scoring, and image-based damage assessment. The applications are mature and produce verifiable accuracy improvements over conventional methods. Again, the honest critiques tend to concern interpretability and fairness rather than underlying efficacy.

In medical risk assessment broadly — the domain closest to life settlement underwriting — the literature is more cautious. Multivariable mortality models published in peer-reviewed venues (including work documented in Communications Medicine and Scientific Reports) show that machine learning can improve on regression-based baselines in certain settings, particularly short-horizon mortality prediction in hospitalized populations. But the improvements are typically modest, the validation is condition-specific, and the deployment in production underwriting workflows requires extensive clinical oversight.

The pattern is consistent: AI adds the most value where data is plentiful, outcomes are well-defined, and the population is similar to the training population. It adds the least value, and creates the most risk, where any of those conditions fail. Life settlement underwriting falls in between. The data is reasonably plentiful for some conditions and sparse for others; the outcomes are well-defined (death is unambiguous) but the timing distribution is the actual quantity of interest, which is harder; and the population is reasonably stable but evolving.

The role of human judgment

Our posture on this question is that AI works best in this market as an augmentation of human judgment rather than as a replacement for it. The reasons follow from the limits described above.

Specialist clinicians and medical underwriters bring two things that current machine learning systems do not. The first is genuine reasoning under uncertainty: the ability to look at an unusual case and think through what the relevant base rates and analogies are, rather than relying on whatever the training distribution happened to contain. The second is judgment about when to be skeptical: the ability to notice that something about a particular case does not fit the usual pattern and to flag it for additional investigation.

Machine learning systems contribute scale, consistency, structured extraction of information from messy records, and the second-opinion function described earlier. The combination of human clinical reasoning and machine consistency is, in our experience, more powerful than either approach alone.

This posture has consequences for how an underwriting system is built. It means model output is one input among several, not the final answer. It means policies where the model and the LE report agree get a different workflow from policies where they disagree. It means clinicians and underwriters have explicit authority to override model output when their judgment warrants. And it means continuous monitoring of how the system is performing — comparing realized outcomes against the modeled distributions the bid decisions were based on, and adjusting the discipline of human review and override accordingly.

Questions worth asking any manager that claims AI-enhanced underwriting

For an allocator evaluating a manager's claims, a small number of focused questions tend to separate substantive systems from marketing language.

•What specific tasks does the model perform, and how is its output used in the underwriting workflow? A vague answer suggests vague usage.
•What is the validation methodology? How is model performance measured, and against what baseline?
•How does the manager handle cases where the model disagrees with independent LE reports? Is there an explicit reconciliation process?
•How are rare conditions and demographic subgroups handled, given that machine learning performs less reliably on small samples?
•How frequently is the model retrained, and on what data?
•What is the governance around model output: who has the authority to override the model, and how is that authority exercised in practice?
•What is the manager's view on the limits of machine learning in this market, and what is the manager not claiming?

A manager who can answer these questions in detail, and who is candid about the limits of their system, is more likely to have a substantive AI capability than one whose answers tend toward "proprietary methodology" or "trade secret."

How we answer these questions ourselves

It would be inconsistent to write an article like this without addressing the same questions to our own practice. Our short answers are as follows; longer answers are available in conversation.

We use commercially available AI models rather than custom-trained ones. We have not trained a proprietary model and do not claim to have done so. The state-of-the-art commercial models available to the broader market are sophisticated, well-resourced, and continuously refined by their vendors at a scale and depth no individual asset manager could reasonably match. Using such models gives us the benefit of that investment without the burden of recreating it. The proprietary work product on our side is the integration: the structured-feature framework we use to represent medical records, the workflow that combines model output with third-party LE reports and human clinical review, the disciplines around when we trust the model and when we discount it, and the governance around overrides and case-by-case reconciliation. The model is the engine; the value is in how we have built the machine around it.

Concretely, the model contributes to four tasks in our workflow: parsing redacted medical records into structured features, generating an independent mortality estimate from those features, flagging policies whose third-party LE reports diverge meaningfully from the model's estimate, and identifying patterns across our existing portfolio that inform pricing on new acquisitions. Its output is one input into the bid decision, not the decision itself.

We validate the model's performance in our specific use along three principal lines: backtesting against medical records where mortality outcomes are now known, comparison of model predictions against realized outcomes within our existing portfolio, and cross-validation against third-party LE reports for policies we ultimately declined. The primary baseline against which we measure value is the third-party LE reports themselves, which we purchase and rely on for every policy we evaluate. We do not claim the model has superhuman accuracy on individual lives, because no model — ours, the third-party LEs', or anyone else's — can predict individual mortality timing precisely. We measure value at the portfolio level and on the calibration of confidence around individual estimates.

Rare conditions and demographic subgroups where the underlying model is least reliable automatically receive enhanced human review, with the model's contribution to the bid decision appropriately discounted. Override authority on every case rests with the CIO and the portfolio manager; overrides require explicit justification recorded against the case, and overrides are reviewed periodically as a body to assess whether the team's judgment is calibrated against the model's — in either direction.

And on the limits: we do not claim our use of AI eliminates longevity risk, replaces specialist clinical judgment, gives us an information edge over carriers, or removes the need for third-party LE reports. None of those would be honest claims. What we do claim is that the model improves consistency and scale, provides an independent check on third-party LE reports, contributes to the discipline of declining policies that do not meet underwriting standards, and works as one of several safeguards in a process that has multiple layers of safeguard.

A short closing

The honest accounting of AI in mortality underwriting is that machine learning is making real, verifiable improvements in scale, consistency, document processing, discrepancy detection, and portfolio-level analytics. It is not replacing specialist clinical judgment, and the literature does not support the strongest claims about model accuracy on rare conditions, novel medical situations, or underrepresented subgroups. The technology is a useful tool integrated with discipline. It is not a replacement for the discipline itself.

Sea Point's posture, in summary, is that AI in this market is best treated as a useful tool inside a disciplined process, not a replacement for the process. The discipline is what makes the tool worth having. The honest framing for any conversation about AI in mortality underwriting is the one outlined here: a clear picture of what the technology does, what it does not yet do, and how the manager has built around it. A fuller treatment of how this fits into our overall investment approach is available at seapoint.capital/strategy/life-settlements.

Sea Point Capital works with qualified investors and their advisors interested in insurance-linked investment strategies. To learn more about our approach, we welcome the opportunity to speak directly.

Where AI is actually changing underwriting — and where it's marketing