Stopping a Lustre File System Tutorial

Stopping a Lustre file system safely prevents data corruption, ensures clean recovery, and allows for proper resource teardown. The correct order is: unmount clients first (to flush data), then OSS/OSTs, then MDS/MDT/MGS last. This guide is for Lustre 2.17.0 (January 2026), based on the Lustre Operations Manual (updated 2025). Use for maintenance, upgrades, or shutdown. For production, integrate with HA tools like Pacemaker.

Prerequisites

Correct Shutdown Order

StepComponentReason
1ClientsFlush dirty data/locks to servers; prevent eviction.
2OSS/OSTsCommit OST transactions; unmount after clients.
3MDS/MDT/MGSFinal commit; MGS last if separate.
4Unload ModulesFree resources; optional if rebooting.

Step-by-Step Teardown

Assumes a simple setup (e.g., from 3-node tutorial: Node3 client, Node2 OSS, Node1 MDS).

1. Unmount Clients

# On each client (e.g., Node3)
umount /mnt/testfs

# If busy: Force unmount
umount -f /mnt/testfs

# Verify no mounts
lshowmount -v  # From MDS

2. Unmount OSS/OSTs

# On OSS (e.g., Node2)
umount /mnt/testfs-ost1
umount /mnt/testfs-ost2

# Or use lustre_rmmod if stuck
lustre_rmmod  # Unloads modules, forces unmount

# Verify
lfs df -h  # Should show no OSTs

3. Unmount MDS/MDT/MGS

# On MDS (e.g., Node1)
umount /mnt/testfs-mdt0

# Force if needed
umount -f /mnt/testfs-mdt0

4. Unload Modules and Shutdown

# On all nodes
lustre_rmmod  # Or modprobe -r lustre lnet etc.

# Shutdown nodes
shutdown -h now

Best Practices for Friendly Shutdown

Common Issues

IssueFix
Device BusyKill processes (fuser -m /mnt); force umount.
Recovery StuckSet lctl set_param timeout=0 temporarily.
Modules StuckUse lustre_rmmod or reboot.

For restarts, reverse order: Start MGS/MDT, OSS/OSTs, then clients.