--- zzzz-none-000/linux-3.10.107/Documentation/lockup-watchdogs.txt 2017-06-27 09:49:32.000000000 +0000 +++ scorpion-7490-727/linux-3.10.107/Documentation/lockup-watchdogs.txt 2021-02-04 17:41:59.000000000 +0000 @@ -12,7 +12,7 @@ will stay locked up. Alternatively, the kernel can be configured to panic; a sysctl, "kernel.softlockup_panic", a kernel parameter, "softlockup_panic" (see "Documentation/kernel-parameters.txt" for -details), and a compile option, "BOOTPARAM_HARDLOCKUP_PANIC", are +details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are provided for this. A 'hardlockup' is defined as a bug that causes the CPU to loop in @@ -20,8 +20,9 @@ details), without letting other interrupts have a chance to run. Similarly to the softlockup case, the current stack trace is displayed upon detection and the system will stay locked up unless the default -behavior is changed, which can be done through a compile time knob, -"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog" +behavior is changed, which can be done through a sysctl, +'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC", +and a kernel parameter, "nmi_watchdog" (see "Documentation/kernel-parameters.txt" for details). The panic option can be used in combination with panic_timeout (this @@ -61,3 +62,21 @@ administrators to configure the period of the hrtimer and the perf event. The right value for a particular environment is a trade-off between fast response to lockups and detection overhead. + +By default, the watchdog runs on all online cores. However, on a +kernel configured with NO_HZ_FULL, by default the watchdog runs only +on the housekeeping cores, not the cores specified in the "nohz_full" +boot argument. If we allowed the watchdog to run by default on +the "nohz_full" cores, we would have to run timer ticks to activate +the scheduler, which would prevent the "nohz_full" functionality +from protecting the user code on those cores from the kernel. +Of course, disabling it by default on the nohz_full cores means that +when those cores do enter the kernel, by default we will not be +able to detect if they lock up. However, allowing the watchdog +to continue to run on the housekeeping (non-tickless) cores means +that we will continue to detect lockups properly on those cores. + +In either case, the set of cores excluded from running the watchdog +may be adjusted via the kernel.watchdog_cpumask sysctl. For +nohz_full cores, this may be useful for debugging a case where the +kernel seems to be hanging on the nohz_full cores.