Palm Logo
June 22, 2026·PalmAI-ProductTeam

Fusion vs. Fallback: What 'Multimodal' Really Means in Biometric Authentication

TL;DR

"Multimodal" has become a default selling point for authentication services, but the word hides a real fork. Combining two traits at the template level — fusing them into a single decision — is a different engineering act from chaining them in sequence, where the system tries face, then falls back to a fingerprint. The first is designed to be harder to spoof; the second is mostly designed to reduce friction. As deepfakes erode single-signal capture, that distinction is the one buyers should be asking about. Fusion is a deliberate choice, not a sensor count.


The Big Picture: "Multimodal" Is Doing Two Different Jobs

Read a dozen biometric authentication datasheets and you'll notice "multimodal" appears on almost all of them. It has quietly become table stakes — a box vendors tick rather than a property they explain. That's a problem, because the word is being used to describe two genuinely different architectures, and only one of them does what buyers think they're buying.

The honest version of multimodal isn't "we added more sensors." It's a specific claim about where two biometric signals meet. Do they combine into one verification decision — or do they sit in a line, where the system reaches for the second only when the first fails? Both get marketed as multimodal. They behave nothing alike under attack.

Worth separating the two before the term loses all meaning.


Why No Single Modality Is Enough

Start with the premise everyone agrees on: there is no perfect biometric trait.

Every unimodal system inherits the structural weaknesses of the one signal it reads. A peer-reviewed review in IET Biometrics puts it plainly — unimodal systems "suffer from several limitations such as low recognition accuracy, non-universality, [and] sensitivity" to noise and spoofing (IET Biometrics, Wiley). Face recognition struggles in poor lighting and shares too much with a photograph. Fingerprint recognition fails on worn or wet fingers and leaves a latent copy on every surface you touch. Iris recognition is accurate but capture-sensitive and intrusive. None is universal; none is unspoofable.

This is why standards bodies treat fusion as the serious answer rather than a marketing flourish. NIST's own work on the subject frames multimodal design as a deliberate testing and engineering discipline, not a default upgrade (NIST: Multimodal Biometrics — Issues in Design and Testing). The field's instinct, in other words, is correct: combine traits. The question is how.


The Fork Nobody Names: Fusion vs. Fallback

Here is where the datasheets blur together and the engineering does not.

Academic work on multibiometrics is precise about this. Fusion can happen at several levels — sensor, feature, score, rank, and decision (Multibiometric fusion strategy: a review, ScienceDirect). The earlier the fusion — at the feature or template level — the more the two signals function as one verification event that an attacker has to defeat simultaneously. NIST's analysis of when and how to fuse two biometrics treats this as a design decision with real trade-offs, not a free accuracy bump (NIST: When to Fuse Two Biometrics).

Now contrast that with what most "multimodal" consumer systems actually do. Try face. If face fails — bad light, a mask, low confidence — fall back to a PIN or a fingerprint. That's sequential fallback, and it's a usability pattern, not a security one. Each modality still stands alone; the attacker only has to beat whichever one is easiest on a given day. A system with five fallback options has five independent doors, not one reinforced one.

The practical test for any authentication service is one question: does an attacker have to defeat both signals at the same moment, or just the weakest available one? Fusion answers "both." Fallback answers "the weakest." Same marketing word, opposite security posture.


Why the Fallback Shortcut Is Getting Exposed Now

This distinction used to be academic. Deepfakes made it operational.

When face was the strong modality and a PIN was the fallback, the fallback was the obvious weak point. What's changed is that the primary signal is now under direct attack. Identity-verification vendors report that single liveness signals each have a named failure mode — depth checks against 3D masks, remote-PPG against advanced deepfakes, texture analysis against poor lighting — and the recommendation that follows is explicit: the strongest liveness systems "don't rely on any single signal" (Sumsub: How Fraudsters Bypass Facial Recognition). That's a vendor's own framing, and worth reading as such, but it lines up with where the standards have gone. ISO/IEC 30107-3 made presentation-attack detection a formal, testable requirement rather than an optional feature (ISO/IEC 30107-3:2023) — a tacit acknowledgement that any single capture channel is attackable by default.

The World Economic Forum, hardly a biometrics vendor, lands in the same place: defending against deepfakes means correlating identity signals across channels rather than trusting any one (WEF: Unmasking Cybercrime, 2026). That is the multimodal thesis stated by an authority with no scanner to sell.

This is also the most honest place to locate our own position in the picture. Our palm recognition combines palm print and palm vein in a single gesture — two signals captured in one act and fused before a decision is made, not chained as fallbacks. The relevant property isn't a bigger accuracy number; it's a different attack surface. Vein patterns sit beneath the skin and can't be captured without consent and proximity, which means they simply aren't in the layer a camera — or a deepfake — can reach. Against the failure modes above, that's the structural argument for template-level fusion, and it's the one we'd ask any buyer to pressure-test rather than take on faith. (For transparency: our published dual-modal false-accept figure of <10⁻¹⁵, versus roughly 10⁻⁶ for single-modal iris recognition, is our own measurement, not a neutral benchmark.)

Two architectures sold under one word
DimensionTemplate-level fusionSequential fallback
What the user doesOne act, two signals captured togetherTries one modality, then another if it fails
What an attacker must defeatBoth signals simultaneouslyOnly the weakest available signal
Primary design goalSpoof resistanceReduced friction / fewer lockouts
Effect of adding modalitiesFewer independent attack pathsMore independent attack paths

The Regulatory Pressure Is Building, Not Settled

There's a tempting fourth argument — that regulation is forcing the industry toward accountable, layered biometrics — and it's worth handling carefully. The EU AI Act does classify many biometric-identification uses as high-risk, with obligations phasing in through 2026 (State of Surveillance: EU AI Act explainer). But the timeline is genuinely in flux — there's an active "Digital Omnibus" debate about postponing some deadlines (BiometricUpdate: EU Parliament backs delaying AI Act deadlines). So the honest read isn't "a hard deadline is coming." It's that the direction of travel points toward auditable, consent-driven biometric design, even if the calendar slips. That's pressure, not a date.


Where Tencent PalmAI Fits in This Picture

For most buyers, the takeaway isn't "switch to palm." It's "interrogate the word." When an authentication service says multimodal, ask whether the signals fuse into one decision or queue up as fallbacks — and ask which signal an attacker actually has to beat.

Our own answer to that question is dual-modal fusion by design, deployed at scale rather than on a slide: roughly 100 million users as of 2025, and a 7-Eleven rollout that reached 1,500 stores in a single month. If you're evaluating biometric authentication for payment, KYC, or access control, that operating record is the part worth reading before the datasheet. Our KYCMax identity platform and PayMax payment platform are where the fusion approach shows up in production across industries.


What This Means for Decision Makers

If you are…Consider…Timeline
A CISO or fraud leadAuditing whether your current "multimodal" stack fuses signals or merely falls back — the second leaves the weakest modality fully exposed to deepfakes.Review this quarter
A product leader choosing an authentication serviceAsking vendors, in writing, at which level fusion happens (feature/score/decision) and what an attacker must defeat simultaneously.Before next procurement
A bank or payments strategistTreating presentation-attack detection (ISO/IEC 30107-3) and cross-signal correlation as baseline requirements, not premium add-ons.Next 6–12 months
A compliance or privacy leadBuilding consent UX, template handling and deletion flows now — regulatory direction is clear even where the dates are not.Now

Frequently Asked Questions

Isn't any multimodal biometric authentication system more secure than a single one?

Not automatically. It depends on whether the modalities are fused into one decision or arranged as fallbacks. A fallback design can actually widen the attack surface, because an attacker only needs to beat whichever modality is easiest to spoof on a given attempt. Fusion at the feature or template level is the version that raises the bar, because both signals must be defeated at once.

What's the difference between fusion and fallback in plain terms?

Fusion captures two signals in a single act and combines them before deciding — like reading palm print and palm vein from one gesture. Fallback tries one modality, and only if it fails does it reach for another, such as face recognition that drops to a PIN. One is a security architecture; the other is mostly a usability convenience.

How does multimodal biometrics defend against deepfakes specifically?

Deepfakes target capture channels a camera can see — primarily biometric face recognition and surface-only liveness. Correlating an additional signal that a camera can't reach (for example, sub-dermal palm vein patterns) means a synthetic face alone isn't sufficient. The WEF's 2026 guidance points the same way: correlate signals across channels rather than trusting one.

Does adding iris or fingerprint recognition to face make it multimodal?

Only in name, unless the signals are fused. Stacking iris recognition or fingerprint recognition as additional fallback options doesn't deliver fusion's security benefit — it adds independent doors. The meaningful question is whether the system combines them into a single verification event.

How can a buyer tell fusion from fallback when evaluating a vendor?

Ask one question: must an attacker defeat both signals at the same moment, or just the weakest available one? Then ask at which level fusion occurs (sensor, feature, score, or decision). Vendors doing real fusion can answer precisely; vendors doing fallback tend to answer with sensor counts.


Further Reading


Related Resources


About Tencent PalmAI

Tencent PalmAI is an AI-powered palm recognition service that combines palm print and palm vein identification into a single contactless biometric act.

In the multimodal conversation specifically, we're an example of the fusion side of the fork — two signals captured together and combined before a decision, deployed across payment, identity and access control rather than positioned as a fallback layer.

Learn more at palm.tencent.com

Ready to start ?
Use PalmAI in your business now!