--- zzzz-none-000/linux-3.10.107/Documentation/timers/NO_HZ.txt 2017-06-27 09:49:32.000000000 +0000 +++ scorpion-7490-727/linux-3.10.107/Documentation/timers/NO_HZ.txt 2021-02-04 17:41:59.000000000 +0000 @@ -7,21 +7,59 @@ some types of computationally intensive high-performance computing (HPC) applications and for real-time applications. -There are two main contexts in which the number of scheduling-clock -interrupts can be reduced compared to the old-school approach of sending -a scheduling-clock interrupt to all CPUs every jiffy whether they need -it or not (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n for older kernels): +There are three main ways of managing scheduling-clock interrupts +(also known as "scheduling-clock ticks" or simply "ticks"): -1. Idle CPUs (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y for older kernels). +1. Never omit scheduling-clock ticks (CONFIG_HZ_PERIODIC=y or + CONFIG_NO_HZ=n for older kernels). You normally will -not- + want to choose this option. + +2. Omit scheduling-clock ticks on idle CPUs (CONFIG_NO_HZ_IDLE=y or + CONFIG_NO_HZ=y for older kernels). This is the most common + approach, and should be the default. + +3. Omit scheduling-clock ticks on CPUs that are either idle or that + have only one runnable task (CONFIG_NO_HZ_FULL=y). Unless you + are running realtime applications or certain types of HPC + workloads, you will normally -not- want this option. + +These three cases are described in the following three sections, followed +by a third section on RCU-specific considerations, a fourth section +discussing testing, and a fifth and final section listing known issues. + + +NEVER OMIT SCHEDULING-CLOCK TICKS + +Very old versions of Linux from the 1990s and the very early 2000s +are incapable of omitting scheduling-clock ticks. It turns out that +there are some situations where this old-school approach is still the +right approach, for example, in heavy workloads with lots of tasks +that use short bursts of CPU, where there are very frequent idle +periods, but where these idle periods are also quite short (tens or +hundreds of microseconds). For these types of workloads, scheduling +clock interrupts will normally be delivered any way because there +will frequently be multiple runnable tasks per CPU. In these cases, +attempting to turn off the scheduling clock interrupt will have no effect +other than increasing the overhead of switching to and from idle and +transitioning between user and kernel execution. + +This mode of operation can be selected using CONFIG_HZ_PERIODIC=y (or +CONFIG_NO_HZ=n for older kernels). + +However, if you are instead running a light workload with long idle +periods, failing to omit scheduling-clock interrupts will result in +excessive power consumption. This is especially bad on battery-powered +devices, where it results in extremely short battery lifetimes. If you +are running light workloads, you should therefore read the following +section. + +In addition, if you are running either a real-time workload or an HPC +workload with short iterations, the scheduling-clock interrupts can +degrade your applications performance. If this describes your workload, +you should read the following two sections. -2. CPUs having only one runnable task (CONFIG_NO_HZ_FULL=y). -These two cases are described in the following two sections, followed -by a third section on RCU-specific considerations and a fourth and final -section listing known issues. - - -IDLE CPUs +OMIT SCHEDULING-CLOCK TICKS FOR IDLE CPUs If a CPU is idle, there is little point in sending it a scheduling-clock interrupt. After all, the primary purpose of a scheduling-clock interrupt @@ -59,10 +97,12 @@ dyntick-idle mode. -CPUs WITH ONLY ONE RUNNABLE TASK +OMIT SCHEDULING-CLOCK TICKS FOR CPUs WITH ONLY ONE RUNNABLE TASK If a CPU has only one runnable task, there is little point in sending it a scheduling-clock interrupt because there is no other task to switch to. +Note that omitting scheduling-clock ticks for CPUs with only one runnable +task implies also omitting them for idle CPUs. The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task, @@ -81,14 +121,15 @@ "nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks CPUs. Note that you are prohibited from marking all of the CPUs as adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain -online to handle timekeeping tasks in order to ensure that system calls -like gettimeofday() returns accurate values on adaptive-tick CPUs. -(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no -running user processes to observe slight drifts in clock rate.) -Therefore, the boot CPU is prohibited from entering adaptive-ticks -mode. Specifying a "nohz_full=" mask that includes the boot CPU will -result in a boot-time error message, and the boot CPU will be removed -from the mask. +online to handle timekeeping tasks in order to ensure that system +calls like gettimeofday() returns accurate values on adaptive-tick CPUs. +(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running +user processes to observe slight drifts in clock rate.) Therefore, the +boot CPU is prohibited from entering adaptive-ticks mode. Specifying a +"nohz_full=" mask that includes the boot CPU will result in a boot-time +error message, and the boot CPU will be removed from the mask. Note that +this means that your system must have at least two CPUs in order for +CONFIG_NO_HZ_FULL=y to do anything for you. Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies that all CPUs other than the boot CPU are adaptive-ticks CPUs. This @@ -117,13 +158,9 @@ to the need to inform kernel subsystems (such as RCU) about the change in mode. -3. POSIX CPU timers on adaptive-tick CPUs may miss their deadlines - (perhaps indefinitely) because they currently rely on - scheduling-tick interrupts. This will likely be fixed in - one of two ways: (1) Prevent CPUs with POSIX CPU timers from - entering adaptive-tick mode, or (2) Use hrtimers or other - adaptive-ticks-immune mechanism to cause the POSIX CPU timer to - fire properly. +3. POSIX CPU timers prevent CPUs from entering adaptive-tick mode. + Real-time applications needing to take actions based on CPU time + consumption need to use other means of doing so. 4. If there are more perf events pending than the hardware can accommodate, they are normally round-robined so as to collect @@ -192,6 +229,29 @@ where you want them to run. +TESTING + +So you enable all the OS-jitter features described in this document, +but do not see any change in your workload's behavior. Is this because +your workload isn't affected that much by OS jitter, or is it because +something else is in the way? This section helps answer this question +by providing a simple OS-jitter test suite, which is available on branch +master of the following git archive: + +git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git + +Clone this archive and follow the instructions in the README file. +This test procedure will produce a trace that will allow you to evaluate +whether or not you have succeeded in removing OS jitter from your system. +If this trace shows that you have removed OS jitter as much as is +possible, then you can conclude that your workload is not all that +sensitive to OS jitter. + +Note: this test requires that your system have at least two CPUs. +We do not currently have a good way to remove OS jitter from single-CPU +systems. + + KNOWN ISSUES o Dyntick-idle slows transitions to and from idle slightly. @@ -238,6 +298,11 @@ single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER tasks, even though these interrupts are unnecessary. + And even when there are multiple runnable tasks on a given CPU, + there is little point in interrupting that CPU until the current + running task's timeslice expires, which is almost always way + longer than the time of the next scheduling-clock interrupt. + Better handling of these sorts of situations is future work. o A reboot is required to reconfigure both adaptive idle and RCU @@ -268,6 +333,16 @@ scheduling-clock interrupt going in order to support accurate timekeeping. -o If there are adaptive-ticks CPUs, there will be at least one - CPU keeping the scheduling-clock interrupt going, even if all - CPUs are otherwise idle. +o If there might potentially be some adaptive-ticks CPUs, there + will be at least one CPU keeping the scheduling-clock interrupt + going, even if all CPUs are otherwise idle. + + Better handling of this situation is ongoing work. + +o Some process-handling operations still require the occasional + scheduling-clock tick. These operations include calculating CPU + load, maintaining sched average, computing CFS entity vruntime, + computing avenrun, and carrying out load balancing. They are + currently accommodated by scheduling-clock tick every second + or so. On-going work will eliminate the need even for these + infrequent scheduling-clock ticks.