Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe Robert Love Last Updated: 21 Oct 2001 INTRODUCTION A preemptible kernel creates new locking issues. The issues are the same as those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible kernel model leverages existing SMP locking mechanisms. Thus, the kernel requires explicit additional locking for very few additional situations. This document is for all kernel hackers. Developing code in the kernel requires protecting these situations. As you will see, these situations would normally require a lock, where they not per-CPU. RULE #1: Per-CPU data structures need explicit protection Two similar problems arise. An example code snippet: struct this_needs_locking tux[NR_CPUS]; tux[smp_processor_id()] = some_value; /* task is preempted here... */ something = tux[smp_processor_id()]; First, since the data is per-CPU, it may not have explicit SMP locking, but require it otherwise. Second, when a preempted task is finally rescheduled, the previous value of smp_processor_id may not equal the current. You must protect these situations by disabling preemption around them. RULE #2: CPU state must be protected. Under preemption, the state of the CPU must be protected. This is arch- dependent, but includes CPU structures and state not preserved over a context switch. For example, on x86, entering and exiting FPU mode is now a critical section that must occur while preemption is disabled. Think what would happen if the kernel is executing a floating-point instruction and is then preempted. Remember, the kernel does not save FPU state except for user tasks. Therefore, upon preemption, the FPU registers will be sold to the lowest bidder. Thus, preemption must be disabled around such regions.i Note, some FPU functions are already explicitly preempt safe. For example, kernel_fpu_begin and kernel_fpu_end will disable and enable preemption. However, math_state_restore must be called with preemption disabled. SOLUTION Data protection under preemption is achieved by disabling preemption for the duration of the critical region. preempt_enable() decrement the preempt counter preempt_disable() increment the preempt counter preempt_enable_no_resched() decrement, but do not immediately preempt The functions are nestable. In other words, you can call preempt_disable n-times in a code path, and preemption will not be reenabled until the n-th call to preempt_enable. The preempt statements define to nothing if preemption is not enabled. Note that you do not need to explicitly prevent preemption if you are holding any locks or interrupts are disabled, since preemption is implicitly disabled in those cases. Example: cpucache_t *cc; /* this is per-CPU */ preempt_disable(); cc = cc_data(searchp); if (cc && cc->avail) { __free_block(searchp, cc_entry(cc), cc->avail); cc->avail = 0; } preempt_enable(); return 0; Notice how the preemption statements must encompass every reference of the critical variables. Another example: int buf[NR_CPUS]; set_cpu_val(buf); if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); spin_lock(&buf_lock); /* ... */ This code is not preempt-safe, but see how easily we can fix it by simply moving the spin_lock up two lines.