TL;DR – auth with 3 independent factors of authentication – user’s memory, device and external fido2 key; most importantly ensure auth factor reset flow is equally secure – user can initiate reset of only one of the three factors within a given time-window, with cool-offs and privilege downgrade; designated reset buddy to verify and authorize reset request and Video-based KYC flow.
Recent hacks/leaks by LAPSUS have shown the weaknesses in multi-factor auth systems as implemented by large reputed organizations. For a long time, I’ve been noodling on design principles for an auth system that is both technically secure and operationally effective. The LAPSUS hacks finally In this post, I will outline the design of such a system.
Before we get into system design, let me first highlight the key requirements.
- We want the different authentication mechanisms (aka the auth factors in MFA) to be diverse and secure from both technical and operational standpoints.
- We don’t want single points of weaknesses like human super-admins who can unilaterally reset credentials, create new accounts with high-privileges.
- We don’t want fuzzy AI/ML systems built on fundamentally insecure signals giving false sense of security.
- Wherever possible we want automated bots to grant/revoke privileges based on deterministic rules
Okay, with that out of the way, let’s jump into the key design elements.
First, we need to pick really secure auth factors.
A combination of these 3 factors of authentication working together can make it resistant to technical or social engineering attacks –
- Personal Memory Authentication – aka password authentication, backed by a password manager which itself is unlocked by a passphrase from user’s memory.
- Device-binding Authentication – with device’s attested hardware secure-element backed private-key and certificate issued by auth service .
- Physical presence/touch authentication – externally plugged FIDO2 key touch authentication.
Some of the insecure auth factors we should definitely avoid are:
- Any auth factor that depends on insecure channels like telephone networks (SMS), push notifications etc.
- Any auth factor that depends on the human user entering OTP manually or tapping confirmation/authorization buttons.
Device itself is foundational to secure access. Any serious professional environment should mandate devices with dedicated hardware secure enclave with verified system boot-up and system integrity protection. The device and the OS should be able to provide verifiable attestations.
Reset Flows
Then, most importantly, reset flows should be secure. If any of the 3 auth factors need to be reset, the reset flow should equally secure. As we have seen in recent attacks this could be the weakest link.
We need additional auth factors specifically for use during initial registration and during reset flows:
- 2-Person (buddy) system for initiation of user registration or reset flow.
- In-person Identity verification or Remote Identity verification with video.
At any given time, only one of the 3 auth factors is allowed to be reset, with a pre and post cool off periods.
The other two auth factors should be valid and working and will be required to initiate the reset request. This reset request will be notified to user’s communication channels (mobile, email) and has to be seconded by the reset buddy. The reset buddy will have to respond to the request within 15 minutes.
Then an automated system will conduct video KYC. Upon successfully completing all these steps, the requested auth factor can be reset by the user.
The pre-reset cool off period gives the legitimate user a chance to notice and respond any attempted account take-over (in case the designated reset buddy is compromised too).
The post-reset cool off period constraints are:
- User’s privileges are downgraded. High-privilege operations are not allowed.
- Additional auth factors cannot be reset.
Note: if the reset buddy is in a cool-off period themselves they cannot participate as a reset buddy.
Users have to be retrained periodically to maintain their personal operational security.
Each of these mechanisms are critical for safe and secure operation.
Less effective mechanisms
There are commonly applied mechanisms that give false sense of security. While these can be applied, and can stop less sophisticated attacks, we shouldn’t depend on them to provide real security. Some examples of these are:
- Determining user-location based on IP address, device timezone etc.
- Determining device integrity on fuzzy signals on less secure devices (without hardware attestation and without secure boot).
- Compromising the strictness of Video IDV or any of the other flows for the sake of user experience and using AI models based on questionable signals.