State Rollback and Recovery

This document outlines the implementation of state rollback and recovery mechanisms in our MTG Commander game engine, addressing network disruptions and maintaining gameplay integrity despite unstable connections.

Overview
Rollback Architecture
State Snapshots
Deterministic Replay
RNG Synchronization for Rollbacks
Client-Side Prediction
Recovery Processes
Implementation Example

Overview

In networked gameplay, unstable connections can lead to state inconsistencies between the server and clients. The state rollback system allows the game to:

Detect state deviations
Revert to a previous valid state
Deterministically replay actions to catch up
Resume normal play without disrupting the game flow

This approach is particularly important for turn-based games like MTG Commander where the integrity of game state is critical.

Rollback Architecture

Our rollback architecture follows these principles:

Server Authority: The server maintains the authoritative game state
State History: Both server and clients maintain a history of game states
Deterministic Replay: Actions can be replayed deterministically to reconstruct state
Input Buffering: Client inputs are buffered to handle resynchronization
Minimal Disruption: Rollbacks should be as seamless as possible to players

Component Integration

#![allow(unused)]
fn main() {
// src/networking/state/rollback.rs
use bevy::prelude::*;
use bevy_prng::WyRand;
use bevy_rand::prelude::*;
use crate::networking::server::resources::GameServer;
use crate::game_engine::state::GameState;

/// Plugin for handling state rollbacks in networked games
pub struct StateRollbackPlugin;

impl Plugin for StateRollbackPlugin {
    fn build(&self, app: &mut App) {
        app.init_resource::<StateHistory>()
           .init_resource::<ClientInputBuffer>()
           .add_systems(Update, (
               create_state_snapshots,
               detect_state_deviations,
               handle_rollback_requests,
               apply_rollbacks,
           ));
    }
}
}

State Snapshots

The core of our rollback system is the ability to capture and restore game state snapshots:

#![allow(unused)]
fn main() {
/// Resource for tracking game state history
#[derive(Resource)]
pub struct StateHistory {
    /// Timestamped state snapshots
    pub snapshots: Vec<StateSnapshot>,
    /// Maximum number of snapshots to retain
    pub max_snapshots: usize,
    /// Time between state snapshots (in seconds)
    pub snapshot_interval: f32,
    /// Last snapshot time
    pub last_snapshot_time: f32,
}

impl Default for StateHistory {
    fn default() -> Self {
        Self {
            snapshots: Vec::new(),
            max_snapshots: 20, // Store up to 20 snapshots (~1 minute of gameplay at 3s intervals)
            snapshot_interval: 3.0, // Take a snapshot every 3 seconds
            last_snapshot_time: 0.0,
        }
    }
}

/// A complete snapshot of game state at a point in time
#[derive(Clone, Debug)]
pub struct StateSnapshot {
    /// Timestamp when this snapshot was created
    pub timestamp: f32,
    /// Unique sequence number
    pub sequence_id: u64,
    /// Serialized game state
    pub game_state: Vec<u8>,
    /// Serialized RNG state
    pub rng_state: Vec<u8>,
    /// Action sequence that led to this state
    pub action_sequence: Vec<ActionRecord>,
}
}

Creating Snapshots

#![allow(unused)]
fn main() {
/// System to periodically create game state snapshots
pub fn create_state_snapshots(
    mut state_history: ResMut<StateHistory>,
    game_state: Res<GameState>,
    global_rng: Res<GlobalEntropy<WyRand>>,
    time: Res<Time>,
    sequence_tracker: Res<ActionSequence>,
) {
    // Check if it's time for a new snapshot
    if time.elapsed_seconds() - state_history.last_snapshot_time >= state_history.snapshot_interval {
        // Create new snapshot
        let snapshot = StateSnapshot {
            timestamp: time.elapsed_seconds(),
            sequence_id: sequence_tracker.current_sequence_id,
            game_state: serialize_game_state(&game_state),
            rng_state: global_rng.try_serialize_state().unwrap_or_default(),
            action_sequence: sequence_tracker.recent_actions.clone(),
        };
        
        // Add to history
        state_history.snapshots.push(snapshot);
        state_history.last_snapshot_time = time.elapsed_seconds();
        
        // Trim history if needed
        if state_history.snapshots.len() > state_history.max_snapshots {
            state_history.snapshots.remove(0);
        }
    }
}
}

Deterministic Replay

To ensure consistent rollback behavior, all game actions must be deterministic and replayable:

#![allow(unused)]
fn main() {
/// Record of a game action for replay purposes
#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct ActionRecord {
    /// Unique sequence ID for this action
    pub sequence_id: u64,
    /// Player who initiated the action
    pub player_id: Entity,
    /// Timestamp when the action occurred
    pub timestamp: f32,
    /// The actual action
    pub action: GameAction,
}

/// System to replay actions after a rollback
pub fn replay_actions(
    mut commands: Commands,
    mut game_state: ResMut<GameState>,
    mut global_rng: ResMut<GlobalEntropy<WyRand>>,
    rollback_event: Res<RollbackEvent>,
    actions: Vec<ActionRecord>,
) {
    // Restore the game state and RNG to the rollback point
    deserialize_game_state(&mut game_state, &rollback_event.snapshot.game_state);
    global_rng.deserialize_state(&rollback_event.snapshot.rng_state).expect("Failed to restore RNG state");
    
    // Replay all actions that occurred after the rollback point
    for action in actions {
        // Process each action in sequence
        apply_action(&mut commands, &mut game_state, &mut global_rng, action);
    }
}
}

RNG Synchronization for Rollbacks

The RNG state is critical for deterministic rollbacks. We extend our existing RNG synchronization to support rollbacks:

#![allow(unused)]
fn main() {
/// Resource to track RNG snapshots for rollback
#[derive(Resource)]
pub struct RngSnapshotHistory {
    /// History of RNG states indexed by sequence ID
    pub snapshots: HashMap<u64, Vec<u8>>,
    /// Maximum number of RNG snapshots to keep
    pub max_snapshots: usize,
}

impl Default for RngSnapshotHistory {
    fn default() -> Self {
        Self {
            snapshots: HashMap::new(),
            max_snapshots: 100,
        }
    }
}

/// System to capture RNG state before randomized actions
pub fn capture_rng_before_randomized_action(
    sequence_tracker: Res<ActionSequence>,
    global_rng: Res<GlobalEntropy<WyRand>>,
    mut rng_history: ResMut<RngSnapshotHistory>,
) {
    // Save the current RNG state before a randomized action
    if let Some(serialized_state) = global_rng.try_serialize_state() {
        rng_history.snapshots.insert(sequence_tracker.current_sequence_id, serialized_state);
        
        // Clean up old snapshots if needed
        if rng_history.snapshots.len() > rng_history.max_snapshots {
            // Find and remove oldest snapshot
            if let Some(oldest_key) = rng_history.snapshots.keys()
                .min()
                .copied() {
                rng_history.snapshots.remove(&oldest_key);
            }
        }
    }
}
}

Client-Side Prediction

To minimize the perception of network issues, clients can implement prediction:

#![allow(unused)]
fn main() {
/// Resource to track client-side prediction state
#[derive(Resource)]
pub struct PredictionState {
    /// Actions predicted but not yet confirmed
    pub pending_actions: Vec<ActionRecord>,
    /// Whether prediction is currently active
    pub is_predicting: bool,
    /// Last confirmed server sequence ID
    pub last_confirmed_sequence: u64,
}

/// System to apply client-side prediction
pub fn apply_client_prediction(
    mut commands: Commands,
    mut game_state: ResMut<GameState>,
    mut prediction: ResMut<PredictionState>,
    input: Res<Input<KeyCode>>,
    client: Res<GameClient>,
) {
    // Only predict for local player actions
    if let Some(local_player) = client.local_player {
        // Check if a new action was input
        if input.just_pressed(KeyCode::Space) {
            // Example: Predict a "pass turn" action
            let action = GameAction::PassTurn { player: local_player };
            
            // Apply prediction locally
            apply_action_local(&mut commands, &mut game_state, action.clone());
            
            // Record the prediction
            prediction.pending_actions.push(ActionRecord {
                sequence_id: prediction.last_confirmed_sequence + prediction.pending_actions.len() as u64 + 1,
                player_id: local_player,
                timestamp: 0.0, // Will be filled by server
                action,
            });
            
            // Send to server
            // ...
        }
    }
}
}

Recovery Processes

When a network issue is detected, the recovery process begins:

#![allow(unused)]
fn main() {
/// Event triggered when a rollback is needed
#[derive(Event)]
pub struct RollbackEvent {
    /// The snapshot to roll back to
    pub snapshot: StateSnapshot,
    /// Reason for the rollback
    pub reason: RollbackReason,
    /// Clients affected by this rollback
    pub affected_clients: Vec<ClientId>,
}

/// Reasons for triggering a rollback
#[derive(Debug, Clone, Copy)]
pub enum RollbackReason {
    /// State divergence detected
    StateDivergence,
    /// Client reconnected after disconnect
    ClientReconnection,
    /// Server-forced rollback
    ServerForced,
    /// Desync in randomized outcome
    RandomizationDesync,
}

/// System to handle client reconnection with state recovery
pub fn handle_client_reconnection(
    mut commands: Commands,
    mut server: ResMut<GameServer>,
    mut server_events: EventReader<ServerEvent>,
    state_history: Res<StateHistory>,
    mut rollback_events: EventWriter<RollbackEvent>,
    client_states: Res<ClientStateTracker>,
) {
    for event in server_events.read() {
        if let ServerEvent::ClientConnected { client_id } = event {
            // Check if this is a reconnection
            if let Some(player_entity) = server.client_player_map.get(client_id) {
                // Find last known state for this client
                if let Some(last_known_sequence) = client_states.get_last_sequence(*client_id) {
                    // Find appropriate snapshot to roll back to
                    if let Some(snapshot) = find_appropriate_snapshot(&state_history, last_known_sequence) {
                        // Trigger rollback just for this client
                        rollback_events.send(RollbackEvent {
                            snapshot: snapshot.clone(),
                            reason: RollbackReason::ClientReconnection,
                            affected_clients: vec![*client_id],
                        });
                    }
                }
            }
        }
    }
}
}

Implementation Example

Complete Rollback Process

This example shows a complete rollback process after detecting a state divergence:

#![allow(unused)]
fn main() {
/// System to detect and handle state divergences
pub fn detect_state_divergences(
    mut commands: Commands,
    mut state_checksums: EventReader<StateChecksumEvent>,
    state_history: Res<StateHistory>,
    server: Option<Res<GameServer>>,
    mut rollback_events: EventWriter<RollbackEvent>,
) {
    // Only run on server
    if server.is_none() {
        return;
    }
    
    for checksum_event in state_checksums.read() {
        // Compare client checksum with server's expected checksum
        if checksum_event.client_checksum != checksum_event.expected_checksum {
            info!("State divergence detected for client {:?} at sequence {}",
                  checksum_event.client_id, checksum_event.sequence_id);
            
            // Find appropriate snapshot to roll back to
            if let Some(snapshot) = find_rollback_snapshot(&state_history, checksum_event.sequence_id) {
                // Trigger rollback for the affected client
                rollback_events.send(RollbackEvent {
                    snapshot: snapshot.clone(),
                    reason: RollbackReason::StateDivergence,
                    affected_clients: vec![checksum_event.client_id],
                });
                
                // Log the rollback event
                info!("Initiating rollback to sequence {} for client {:?}",
                      snapshot.sequence_id, checksum_event.client_id);
            }
        }
    }
}

/// Find an appropriate snapshot for rollback
fn find_rollback_snapshot(history: &StateHistory, divergence_sequence: u64) -> Option<&StateSnapshot> {
    // Find the most recent snapshot before the divergence
    history.snapshots
        .iter()
        .rev()
        .find(|snapshot| snapshot.sequence_id < divergence_sequence)
}

/// Apply a rollback
pub fn apply_rollback(
    mut commands: Commands,
    mut game_state: ResMut<GameState>,
    mut global_rng: ResMut<GlobalEntropy<WyRand>>,
    mut rollback_events: EventReader<RollbackEvent>,
    action_history: Res<ActionHistory>,
) {
    for event in rollback_events.read() {
        // 1. Restore game state from snapshot
        deserialize_game_state(&mut game_state, &event.snapshot.game_state);
        
        // 2. Restore RNG state
        global_rng.deserialize_state(&event.snapshot.rng_state)
            .expect("Failed to restore RNG state");
        
        // 3. Find actions that need to be replayed
        let actions_to_replay = action_history.get_actions_after(event.snapshot.sequence_id);
        
        // 4. Replay actions
        for action in actions_to_replay {
            apply_action(&mut commands, &mut game_state, &mut global_rng, action.clone());
        }
        
        // 5. Notify clients of the rollback
        for client_id in &event.affected_clients {
            commands.add(SendRollbackNotification {
                client_id: *client_id,
                snapshot: event.snapshot.clone(),
                reason: event.reason,
            });
        }
    }
}
}

Handling Randomized Actions During Rollback

Special consideration for randomized actions like card shuffling:

#![allow(unused)]
fn main() {
/// Apply an action during rollback replay
fn apply_action(
    commands: &mut Commands,
    game_state: &mut GameState,
    global_rng: &mut GlobalEntropy<WyRand>,
    action: ActionRecord,
) {
    match &action.action {
        GameAction::ShuffleLibrary { player, library } => {
            // For randomized actions, we need to ensure deterministic outcomes
            if let Ok(mut player_rng) = players.get_mut(action.player_id) {
                // Important: Use the RNG in a consistent way
                let mut library_entity = *library;
                let mut library_comp = game_state.get_library_mut(library_entity);
                
                // Deterministic shuffle using the player's RNG component
                library_comp.shuffle_with_rng(&mut player_rng.rng);
            }
        },
        GameAction::FlipCoin { player } => {
            // Another example of randomized action
            if let Ok(mut player_rng) = players.get_mut(action.player_id) {
                // The random result will be the same as the original action
                // if the RNG state is properly restored
                let result = player_rng.rng.gen_bool(0.5);
                
                // Apply the result
                game_state.record_coin_flip(*player, result);
            }
        },
        // Handle other action types
        _ => {
            // Apply non-randomized actions normally
            game_state.apply_action(&action.action);
        }
    }
}
}

Real-World Considerations

In practice, a rollback system needs to balance several considerations:

Snapshot Frequency: More frequent snapshots use more memory but allow more precise rollbacks
Rollback Visibility: How visible should rollbacks be to players?
Partial vs. Full Rollbacks: Sometimes only a portion of the state needs rollback
Action Batching: Batch multiple actions to minimize rollback frequency
Bandwidth Costs: State synchronization requires bandwidth - optimize it

Optimizing for MTG Commander

For MTG Commander specifically:

Take snapshots at natural game boundaries (turn changes, phase changes)
Use incremental state updates between major decision points
Maintain separate RNG state for "hidden information" actions like shuffling
Prioritize server authority for rule enforcement and dispute resolution
Enable client prediction for responsive UI during network hiccups

Rummage Documentation