Introduction
eBay strives to excel at security and to identify new and improved secure mechanisms to allow users to seamlessly access their account and in the meantime ensure that the fraudulent and malicious users are kept at bay. This is a balancing act that every internet platform player, major and minor, performs every day. Passwords are one such mechanism used to secure a user’s account but with not much success. They have always been a major pain point for users, as they need to be maintained with a unique combination of hard-to-guess and hard-to-remember numbers, alphabets and special characters. Ironically, this leads users to create passwords that are hard to remember but easy for hackers to brute force.
With new websites and platforms cropping up each day, users are required to remember multiple passwords becoming a daily struggle. Passwords are one of the weakest links in our attempt to secure user accounts, because many users don’t use a strong, complex one or reuse the same password across multiple services, making themselves vulnerable to phishing and other types of attacks.
To relieve users from ever needing to remember their password and to move towards the utopia of a “password-free” world, eBay has released the ability of using one-time code to log in, which can be delivered to the users’ phones, a personal and important device that they own. Along with the ease of getting disposable one-time codes, this also helps while traveling or when using a public computer or network, where users can log in with a one-time code rather than running the risk of having their regular passwords hijacked via a key logger, malware, or even a compromised network. More details about the feature available in this release statement
How does it work?
Any user can now use the “Sign in with a single-use code” link on the log-in page to request that a one-time code be delivered to their registered phone number via text messaging. Then the user can type in the code from the text message into the input field, securely getting into the account without the hassle of remembering or exposing the original password.
The one-time codes are short-lived and cannot be transferred between sessions, making them highly isolated and secure in comparison to regular passwords, which can be used across any number of devices.
Behind the scenes
Given its criticality, the application architecture need to be robust and secure but also better manageable and configurable. The structure should provide better code readability, which in turn ensures high quality code in the production environment. One of the important characteristics of the application architecture is that the finite state machine used for generating and validating the one-time code must conform with all the rules that were set up such as expiry, retry attempts, etc.
Finite state machine
A finite-state machine (FSM) is a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition. A particular FSM is defined by a list of its states, and the triggering condition for each transition. Wikipedia contributors, “Finite-state machine,” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Finite-state_machine&oldid=731346054 (accessed August 18, 2016).
A simple representation of a Turnstile state machine is shown below.
Advantages
FSM provides many advantages over traditional rule-based systems.
- States, transitions, events, and their conditions maintained in a modular and declarative fashion
- Pluggable nature of transitions and conditions
- Abstraction of states, actions, and their roles
- Clarity of behavior during firing of events
- Most importantly, ease of ensuring secure state management
To illustrate, imagine a user has requested a one-time code and received it on their phone. However, while entering the password, they mistyped it incorrectly for more times than the allowed number of attempts. In the case of a rule engine or a simple if/else sequence, there is a high probability of a bug being introduced, as there is no state maintenance — only rule/logic evaluations, which can allow the user to provide the correct code and log in successfully even after exceeding the allowed number of attempts.
However, on a state machine, once the user attempts exceed the limits, the state machine transitions to an AttemptExceeded/Failed
state, making it impossible for the user to attempt a code validation even with the correct code. This structure guarantees that the shortcomings of a sequence model execution are not possible in a state machine.
If-else code for validation
public int validateCode(long transId, String code){
int currentStatus = getPersistence().getCurrentStatus(transId);
boolean status = false;
if(currentStatus==SENT){
status = getSecureModule().validateCode(transId, code);
if(status){
currentStatus = SUCCESS;
}else{
currentStatus = INVALID;
if(getPersistence().verifyAttemptsExceeded(transId)){
currentStatus = ATTEMPT_EXCEEDED;
}
}
}
return currentStatus;
}
Open source frameworks
Rather than writing a FSM from scratch, we evaluated two widely used frameworks based on maturity and support: Spring StateMachine and squirrel-foundation . Spring StateMachine seemed more suited to a daemon application, such as Zookeeper, and was not ready for a multi-threaded environment, such as a web application [Issue#170]. Moreover, squirrel-foundation provided the ability to add custom event types and actions, which are explained in more detail below, helping make the decision to use the squirrel-foundation framework to model the finite state machine.
State machine for one-time code
A simple representation of the state machine used to generate and validate the one-time code for phone is provided below.
As illustrated, the one-time code validation moves between different states of the machine based upon the Action of the user, the current state of the machine, and the conditions configured between them. For instance, when a user requests a code for the first time, if the Send failed for some reason, the state machine moves the transaction to a final FAILED state, rendering the transaction inert and effectively terminated.
Elaborating on the use case discussed earlier, where the user exceeds the number of incorrect code attempts, if the user performs the action of Requests Code retry, the retry-limit-exceeded condition fires and the state machine moves the current state to FAILED, terminating the transaction and preventing the user from re-using the session or the code further.
Set-up and configuration
In order to explain the usage and set-up better, following are some of the important parts of the State machine modeling code snippets. These are not complete nor compilable as is, and they are truncated for brevity.
Builder
The squirrel-foundation framework allows configuring the State Machine once and create new instances of the state machine for each thread without incurring the expensive creation time. The newSM()
method is invoked with the beginning state of the current transaction to get the State Machine ready, and when the events are fired, the State Machine takes care of identifying the next state to transfer to.
StateMachine Builder
public class PhoneSMBuilder {
private static StateMachineBuilder<PhoneStateMachine, PhoneStateId, PhoneEvent, PhoneContext> stateMachineBuilder;
static {
buildSM();
}
private static synchronized void buildSM() {
if (null != stateMachineBuilder) {
return;
}
stateMachineBuilder = StateMachineBuilderFactory.create(PhoneStateMachine.class, PhoneStateId.class, PhoneEvent.class, PhoneContext.class);
stateMachineBuilder.setStateMachineConfiguration(StateMachineConfiguration.create());
}
....
public static PhoneStateMachine newSM(StateId initialState) {
PhoneStateMachine stateMachine = stateMachineBuilder.newStateMachine(initialState);
stateMachine.start();
return stateMachine;
}
Transitions and conditions
Each of the transitions from one state to another state is managed as an external transition and guarded by conditions. For the Phone StateMachine, the positive transitions are configured similar to below.
buildSM() – adding transitions
private static synchronized void buildSM() {
.....
stateMachineBuilder.externalTransition()
.from(PhoneStateId.INITIAL).to(PhoneStateId.DELIVERED)
.on(PhoneEvent.SEND_CODE)
.when(checkEventActionExecutionResult());
stateMachineBuilder.externalTransition()
.from(PhoneStateId.DELIVERED).to(PhoneStateId.SUCCESS)
.on(PhoneEvent.VALIDATE_CODE)
.when(checkEventActionExecutionResult());
....
}
private static Condition<PhoneContext> checkEventActionExecutionResult() {
return new AnonymousCondition<PhoneContext>() {
@Override
public boolean isSatisfied(PhoneContext context) {
return context.isActionSuccess();
}
};
}
As illustrated, the Condition<T>
is an interface provided by the squirrel-foundation framework for configuring the conditions specific to a state-to-state transition. A successful Boolean response to the isSatisfied()
method fires the transition that the condition satisfies.
NOTE: If more than one Condition
fires for a transition from one initial state to two different end states, an exception is thrown. It is imperative that each condition is mutually exclusive with other conditions for the same initial state.
Even for error scenarios, such as expired or attempts exceeded, the state transition is simple to configure, maintain, and change, ensuring the isolation of the change and easy testability.
buildSM() – adding transitions and conditions
private static synchronized void buildSM() {
.....
stateMachineBuilder.externalTransition()
.from(PhoneStateId.DELIVERED).to(PhoneStateId.EXPIRED)
.on(PhoneEvent.VALIDATE_CODE)
.when(codeExpiredCondition());
stateMachineBuilder.externalTransition().
from(PhoneStateId.DELIVERED).to(PhoneStateId.FAILED)
.on(PhoneEvent.VALIDATE_CODE)
.when(failureCheckCondition());
}
private static Condition<PhoneContext> codeExpiredCondition() {
return new AnonymousCondition<PhoneContext>() {
@Override
public boolean isSatisfied(PhoneContext context) {
return (!context.isActionSuccess() && context.getError() == Errors.ExpiredCode);
}
};
}
Persistence
For persistence and initialization, the State Machine is backed by a database. The database helps in reading the initial state, which helps in starting the State Machine and also in storing the resolved state after firing of the event in the State Machine. The framework provides appropriate hooks such as afterTransitionCompleted
and afterTransitionDeclined
for persisting the states. Squirrel-foundation also provides a mechanism to identify other unchecked exceptions using afterTransitionCausedException
, which is useful for alerting and monitoring purposes.
PhoneStateMachine – adding afterTransitions
public class PhoneStateMachine extends AbstractStateMachine<PhoneStateMachine, PhoneStateId, PhoneEvent, PhoneContext> {
@Override
protected void afterTransitionCompleted(PhoneStateId fromState, PhoneStateId toState, PhoneEvent event, PhoneContext context) {
persistStateTransition(fromState, toState, event, context);
}
@Override
protected void afterTransitionDeclined(PhoneStateId fromState, PhoneEvent event, PhoneContext context) {
persistStateTransition(fromState, null, event, context);
}
@Override
protected void afterTransitionCausedException(PhoneStateId fromState, PhoneStateId toState, PhoneEvent event, PhoneContext context) {
logger.error("Exception during SM transition", getLastException().getTargetException());
}
}
Custom state machine modification
Even though the squirrel-foundation framework satisfied all the needs, there was no available structure to bind a pre-Action to an Event. For example, the code has to be sent to the user before the state machine is triggered for INITIAL state. Similarly, the code should be validated against the database and relevant context results should be properly populated before firing the state machine. This was achieved by creating a custom state machine and overriding the fire()
method to perform an associated Action and then firing the StateMachine.
PhoneStateMachine – Adding Pre-Action
public class PhoneStateMachine extends AbstractStateMachine<PhoneStateMachine, PhoneStateId, PhoneEvent, PhoneContext> {
@Override
public void fire(PhoneEvent event, PhoneContext context) {
try {
event.getEventAction().execute(context);
if (context.getError() != null) {
logger.error("Event execution failed in SM due to:" + context.getError().name());
}
} catch (Exception e) {
logger.error("Exception in firing event: " + event, e);
context.setError(Errors.UnknownError);
return;
}
super.fire(event, context);
}
}
Squirrel-foundation provides the freedom of defining custom types for all necessary parameters such as Events, Actions, Context etc., which makes it possible for each PhoneEvent to be configured with an associated Action.
PhoneEvent
public enum PhoneEvent {
SEND_CODE(sendCodeAction),
VALIDATE_CODE(validateCodeAction),
;
private PhoneEventAction<PhoneContext> eventAction;
private PhoneEvent(PhoneEventAction<PhoneContext> eventAction) {
this.eventAction = eventAction;
}
public PhoneEventAction<PhoneContext> getEventAction() {
return eventAction;
}
}
public interface PhoneEventAction<T> {
void execute(PhoneContext context);
static final PhoneEventAction<PhoneContext> sendCodeAction = new PhoneEventAction<PhoneContext>() {
@Override
public void execute(PhoneContext context) {
logger.debug("sendPinAction:send code");
}
}
static final PhoneEventAction<PhoneContext> validateCodeAction = new PhoneEventAction<PhoneContext>() {
@Override
public void execute(PhoneContext context) {
logger.debug("validateCodeAction: verify code");
}
}
}
Conclusion
The state machine is a well-known structure for managing processing, and this is just one example of how a complex logical structure can be represented in a simple but effective and maintainable manner. This structure allows developers to manage changes and configure values effectively and almost bug-free. Currently, the above State machine is being configured with better listeners for effective logging, self-healing mechanisms in case of failures, changeover to use RxJava for state persistence, and logging as future enhancements.