--- zzzz-none-000/linux-3.10.107/Documentation/filesystems/proc.txt 2017-06-27 09:49:32.000000000 +0000 +++ scorpion-7490-727/linux-3.10.107/Documentation/filesystems/proc.txt 2021-02-04 17:41:59.000000000 +0000 @@ -28,7 +28,7 @@ 1.6 Parallel port info in /proc/parport 1.7 TTY info in /proc/tty 1.8 Miscellaneous kernel statistics in /proc/stat - 1.9 Ext4 file system parameters + 1.9 Ext4 file system parameters 2 Modifying System Parameters @@ -42,6 +42,7 @@ 3.6 /proc//comm & /proc//task//comm 3.7 /proc//task//children - Information about task children 3.8 /proc//fdinfo/ - Information about opened file + 3.9 /proc//map_files - Information about memory mapped files 4 Configuring procfs 4.1 Mount options @@ -139,11 +140,14 @@ stat Process status statm Process memory status information status Process status in human readable form - wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan + wchan Present with CONFIG_KALLSYMS=y: it shows the kernel function + symbol the task is blocked in - or "0" if not blocked. pagemap Page table stack Report full stack trace, enable via CONFIG_STACKTRACE smaps a extension based on maps, showing the memory consumption of each mapping and flags associated with it + numa_maps an extension based on maps, showing the memory locality and + binding policy as well as mem usage (in pages) of each mapping. .............................................................................. For example, to get the status information of a process, all you have to do is @@ -171,6 +175,7 @@ VmLib: 1412 kB VmPTE: 20 kb VmSwap: 0 kB + HugetlbPages: 0 kB Threads: 1 SigQ: 0/28578 SigPnd: 0000000000000000 @@ -197,12 +202,12 @@ explained in Table 1-4. (for SMP CONFIG users) -For making accounting scalable, RSS related information are handled in -asynchronous manner and the vaule may not be very precise. To see a precise +For making accounting scalable, RSS related information are handled in an +asynchronous manner and the value may not be very precise. To see a precise snapshot of a moment, you can see /proc//smaps file and scan page table. It's slow but very precise. -Table 1-2: Contents of the status files (as of 2.6.30-rc7) +Table 1-2: Contents of the status files (as of 4.1) .............................................................................. Field Content Name filename of the executable @@ -210,6 +215,7 @@ in an uninterruptible wait, Z is zombie, T is traced or stopped) Tgid thread group ID + Ngid NUMA group ID (0 if none) Pid process id PPid process id of the parent process TracerPid PID of process tracing this process (0 if not) @@ -217,6 +223,10 @@ Gid Real, effective, saved set, and file system GIDs FDSize number of file descriptor slots currently allocated Groups supplementary group list + NStgid descendant namespace thread group ID hierarchy + NSpid descendant namespace process ID hierarchy + NSpgid descendant namespace process group ID hierarchy + NSsid descendant namespace session ID hierarchy VmPeak peak virtual memory size VmSize total program size VmLck locked memory size @@ -227,14 +237,16 @@ VmExe size of text segment VmLib size of shared library code VmPTE size of page table entries + VmPMD size of second level page tables VmSwap size of swap usage (the number of referred swapents) + HugetlbPages size of hugetlb memory portions Threads number of threads SigQ number of signals queued/max. number for queue SigPnd bitmap of pending signals for the thread ShdPnd bitmap of shared pending signals for the process SigBlk bitmap of blocked signals SigIgn bitmap of ignored signals - SigCgt bitmap of catched signals + SigCgt bitmap of caught signals CapInh bitmap of inheritable capabilities CapPrm bitmap of permitted capabilities CapEff bitmap of effective capabilities @@ -300,8 +312,8 @@ pending bitmap of pending signals blocked bitmap of blocked signals sigign bitmap of ignored signals - sigcatch bitmap of catched signals - wchan address where process went to sleep + sigcatch bitmap of caught signals + 0 (place holder, used to be the wchan address, use /proc/PID/wchan instead) 0 (place holder) 0 (place holder) exit_signal signal to send to parent thread on exit @@ -334,7 +346,7 @@ a7cb1000-a7cb2000 ---p 00000000 00:00 0 a7cb2000-a7eb2000 rw-p 00000000 00:00 0 a7eb2000-a7eb3000 ---p 00000000 00:00 0 -a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack:1001] +a7eb3000-a7ed5000 rw-p 00000000 00:00 0 a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6 a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6 a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6 @@ -366,7 +378,6 @@ [heap] = the heap of the program [stack] = the stack of the main process - [stack:1001] = the stack of the thread with tid 1001 [vdso] = the "virtual dynamic shared object", the kernel system call handler @@ -374,10 +385,8 @@ The /proc/PID/task/TID/maps is a view of the virtual memory from the viewpoint of the individual tasks of a process. In this file you will see a mapping marked -as [stack] if that task sees it as a stack. This is a key difference from the -content of /proc/PID/maps, where you will see all mappings that are being used -as stack by all of those tasks. Hence, for the example above, the task-level -map, i.e. /proc/PID/task/TID/maps for thread 1001 will look like this: +as [stack] if that task sees it as a stack. Hence, for the example above, the +task-level map, i.e. /proc/PID/task/TID/maps for thread 1001 will look like this: 08048000-08049000 r-xp 00000000 03:00 8312 /opt/test 08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test @@ -414,25 +423,41 @@ Private_Dirty: 0 kB Referenced: 892 kB Anonymous: 0 kB +AnonHugePages: 0 kB +Shared_Hugetlb: 0 kB +Private_Hugetlb: 0 kB Swap: 0 kB +SwapPss: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB -Locked: 374 kB -VmFlags: rd ex mr mw me de +Locked: 0 kB +VmFlags: rd ex mr mw me dw the first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. The remaining lines show the size of the mapping (size), the amount of the mapping that is currently resident in RAM (RSS), the process' proportional share of this mapping (PSS), the number of clean and -dirty private pages in the mapping. Note that even a page which is part of a -MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used -by only one process, is accounted as private and not as shared. "Referenced" -indicates the amount of memory currently marked as referenced or accessed. +dirty private pages in the mapping. + +The "proportional set size" (PSS) of a process is the count of pages it has +in memory, where each page is divided by the number of processes sharing it. +So if a process has 1000 pages all to itself, and 1000 shared with one other +process, its PSS will be 1500. +Note that even a page which is part of a MAP_SHARED mapping, but has only +a single pte mapped, i.e. is currently used by only one process, is accounted +as private and not as shared. +"Referenced" indicates the amount of memory currently marked as referenced or +accessed. "Anonymous" shows the amount of memory that does not belong to any file. Even a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. -"Swap" shows how much would-be-anonymous memory is also used, but out on -swap. +"AnonHugePages" shows the ammount of memory backed by transparent hugepage. +"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical +reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +"Swap" shows how much would-be-anonymous memory is also used, but out on swap. +"SwapPss" shows proportional swap share of this mapping. +"Locked" indicates whether the mapping is locked in memory or not. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter encoded @@ -457,9 +482,9 @@ ac - area is accountable nr - swap space is not reserved for the area ht - area uses huge tlb pages - nl - non-linear mapping ar - architecture specific flag dd - do not include area into core dump + sd - soft-dirty flag mm - mixed map area hg - huge page advise flag nh - no-huge page advise flag @@ -473,7 +498,8 @@ enabled. The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG -bits on both physical and virtual pages associated with a process. +bits on both physical and virtual pages associated with a process, and the +soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details). To clear the bits for all the pages associated with the process > echo 1 > /proc/PID/clear_refs @@ -482,12 +508,51 @@ To clear the bits for the file mapped pages associated with the process > echo 3 > /proc/PID/clear_refs + +To clear the soft-dirty bit + > echo 4 > /proc/PID/clear_refs + +To reset the peak resident set size ("high water mark") to the process's +current value: + > echo 5 > /proc/PID/clear_refs + Any other value written to /proc/PID/clear_refs will have no effect. The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags using /proc/kpageflags and number of times a page is mapped using /proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt. +The /proc/pid/numa_maps is an extension based on maps, showing the memory +locality and binding policy, as well as the memory usage (in pages) of +each mapping. The output follows a general format where mapping details get +summarized separated by blank spaces, one mapping per each file line: + +address policy mapping details + +00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4 +00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4 +320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4 +320698b000 default file=/lib64/libc-2.12.so +3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4 +3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4 +7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4 +7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048 +7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4 + +Where: +"address" is the starting address for the mapping; +"policy" reports the NUMA memory policy set for the mapping (see vm/numa_memory_policy.txt); +"mapping details" summarizes mapping data such as mapping type, page usage counters, +node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page +size, in KB, that is backing the mapping up. + 1.2 Kernel data --------------- @@ -541,7 +606,7 @@ sys See chapter 2 sysvipc Info of SysVIPC Resources (msg, sem, shm) (2.4) tty Info of tty drivers - uptime System uptime + uptime Wall clock since boot, combined idle time of all cpus version Kernel version video bttv info of video resources (2.4) vmallocinfo Show vmalloced areas @@ -756,11 +821,9 @@ > cat /proc/meminfo -The "Locked" indicates whether the mapping is locked in memory or not. - - MemTotal: 16344972 kB MemFree: 13634064 kB +MemAvailable: 14836172 kB Buffers: 3656 kB Cached: 1195708 kB SwapCached: 0 kB @@ -793,6 +856,14 @@ MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code) MemFree: The sum of LowFree+HighFree +MemAvailable: An estimate of how much memory is available for starting new + applications, without swapping. Calculated from MemFree, + SReclaimable, the size of the file LRU lists, and the low + watermarks in each zone. + The estimate takes into account that the system needs some + page cache to function well, and that not all reclaimable + slab will be reclaimable, due to items being in use. The + impact of those factors will vary from system to system. Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so) Cached: in-memory cache for files read from the disk (the @@ -839,7 +910,8 @@ if strict overcommit accounting is enabled (mode 2 in 'vm.overcommit_memory'). The CommitLimit is calculated with the following formula: - CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap + CommitLimit = ([total RAM pages] - [total huge TLB pages]) * + overcommit_ratio / 100 + [total swap pages] For example, on a system with 1G of physical RAM and 7G of swap with a `vm.overcommit_ratio` of 30 it would yield a CommitLimit of 7.3G. @@ -849,16 +921,15 @@ The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G - of memory, but only touches 300M of it will only show up - as using 300M of memory even if it has the address space - allocated for the entire 1G. This 1G is memory which has - been "committed" to by the VM and can be used at any time - by the allocating application. With strict overcommit - enabled on the system (mode 2 in 'vm.overcommit_memory'), - allocations which would exceed the CommitLimit (detailed - above) will not be permitted. This is useful if one needs - to guarantee that processes will not fail due to lack of - memory once that memory has been successfully allocated. + of memory, but only touches 300M of it will show up as + using 1G. This 1G is memory which has been "committed" to + by the VM and can be used at any time by the allocating + application. With strict overcommit enabled on the system + (mode 2 in 'vm.overcommit_memory'),allocations which would + exceed the CommitLimit (detailed above) will not be permitted. + This is useful if one needs to guarantee that processes will + not fail due to lack of memory once that memory has been + successfully allocated. VmallocTotal: total size of vmalloc memory area VmallocUsed: amount of vmalloc area which is used VmallocChunk: largest contiguous block of vmalloc area which is free @@ -1202,9 +1273,9 @@ since the system first booted. For a quick look, simply cat the file: > cat /proc/stat - cpu 2255 34 2290 22625563 6290 127 456 0 0 - cpu0 1132 34 1441 11311718 3675 127 438 0 0 - cpu1 1123 0 849 11313845 2614 0 18 0 0 + cpu 2255 34 2290 22625563 6290 127 456 0 0 0 + cpu0 1132 34 1441 11311718 3675 127 438 0 0 0 + cpu1 1123 0 849 11313845 2614 0 18 0 0 0 intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...] ctxt 1990473 btime 1062191376 @@ -1231,8 +1302,9 @@ The "intr" line gives counts of interrupts serviced since boot time, for each of the possible system interrupts. The first column is the total of all -interrupts serviced; each subsequent column is the total for that particular -interrupt. +interrupts serviced including unnumbered architecture specific interrupts; +each subsequent column is the total for that particular numbered interrupt. +Unnumbered interrupts are not shown, only summed into the total. The "ctxt" line gives the total number of context switches across all CPUs. @@ -1256,7 +1328,7 @@ 1.9 Ext4 file system parameters ------------------------------- +------------------------------- Information about mounted ext4 file systems can be found in /proc/fs/ext4. Each mounted filesystem will have a directory in @@ -1530,16 +1602,16 @@ --------------------------------------------------------------- When a process is dumped, all anonymous memory is written to a core file as long as the size of the core file isn't limited. But sometimes we don't want -to dump some memory segments, for example, huge shared memory. Conversely, -sometimes we want to save file-backed memory segments into a core file, not -only the individual files. +to dump some memory segments, for example, huge shared memory or DAX. +Conversely, sometimes we want to save file-backed memory segments into a core +file, not only the individual files. /proc//coredump_filter allows you to customize which memory segments will be dumped when the process is dumped. coredump_filter is a bitmask of memory types. If a bit of the bitmask is set, memory segments of the corresponding memory type are dumped, otherwise they are not dumped. -The following 7 memory types are supported: +The following 9 memory types are supported: - (bit 0) anonymous private memory - (bit 1) anonymous shared memory - (bit 2) file-backed private memory @@ -1548,20 +1620,22 @@ effective only if the bit 2 is cleared) - (bit 5) hugetlb private memory - (bit 6) hugetlb shared memory + - (bit 7) DAX private memory + - (bit 8) DAX shared memory Note that MMIO pages such as frame buffer are never dumped and vDSO pages are always dumped regardless of the bitmask status. - Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only - effected by bit 5-6. + Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is + only affected by bit 5-6, and DAX is only affected by bits 7-8. -Default value of coredump_filter is 0x23; this means all anonymous memory -segments and hugetlb private memory are dumped. +The default value of coredump_filter is 0x33; this means all anonymous memory +segments, ELF header pages and hugetlb private memory are dumped. If you don't want to dump all shared memory segments attached to pid 1234, -write 0x21 to the process's proc file. +write 0x31 to the process's proc file. - $ echo 0x21 > /proc/1234/coredump_filter + $ echo 0x31 > /proc/1234/coredump_filter When a new process is created, the process inherits the bitmask status from its parent. It is useful to set up coredump_filter before the program runs. @@ -1634,18 +1708,25 @@ if precise results are needed. -3.7 /proc//fdinfo/ - Information about opened file +3.8 /proc//fdinfo/ - Information about opened file --------------------------------------------------------------- This file provides information associated with an opened file. The regular -files have at least two fields -- 'pos' and 'flags'. The 'pos' represents -the current offset of the opened file in decimal form [see lseek(2) for -details] and 'flags' denotes the octal O_xxx mask the file has been -created with [see open(2) for details]. +files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos' +represents the current offset of the opened file in decimal form [see lseek(2) +for details], 'flags' denotes the octal O_xxx mask the file has been +created with [see open(2) for details] and 'mnt_id' represents mount ID of +the file system containing the opened file [see 3.5 /proc//mountinfo +for details]. A typical output is pos: 0 flags: 0100002 + mnt_id: 19 + +All locks associated with a file descriptor are shown in its fdinfo too. + +lock: 1: FLOCK ADVISORY WRITE 359 00:13:11691 0 EOF The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags pair provide additional information particular to the objects they represent. @@ -1654,6 +1735,7 @@ ~~~~~~~~~~~~~ pos: 0 flags: 04002 + mnt_id: 9 eventfd-count: 5a where 'eventfd-count' is hex value of a counter. @@ -1662,6 +1744,7 @@ ~~~~~~~~~~~~~~ pos: 0 flags: 04002 + mnt_id: 9 sigmask: 0000000000000200 where 'sigmask' is hex value of the signal mask associated @@ -1671,6 +1754,7 @@ ~~~~~~~~~~~ pos: 0 flags: 02 + mnt_id: 9 tfd: 5 events: 1d data: ffffffffffffffff where 'tfd' is a target file descriptor number in decimal form, @@ -1704,6 +1788,7 @@ pos: 0 flags: 02 + mnt_id: 9 fanotify flags:10 event-flags:0 fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003 fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 @@ -1720,6 +1805,47 @@ While the first three lines are mandatory and always printed, the rest is optional and may be omitted if no marks created yet. + Timerfd files + ~~~~~~~~~~~~~ + + pos: 0 + flags: 02 + mnt_id: 9 + clockid: 0 + ticks: 0 + settime flags: 01 + it_value: (0, 49406829) + it_interval: (1, 0) + + where 'clockid' is the clock type and 'ticks' is the number of the timer expirations + that have occurred [see timerfd_create(2) for details]. 'settime flags' are + flags in octal form been used to setup the timer [see timerfd_settime(2) for + details]. 'it_value' is remaining time until the timer exiration. + 'it_interval' is the interval for the timer. Note the timer might be set up + with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' + still exhibits timer's remaining time. + +3.9 /proc//map_files - Information about memory mapped files +--------------------------------------------------------------------- +This directory contains symbolic links which represent memory mapped files +the process is maintaining. Example output: + + | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so + | ... + | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1 + | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls + +The name of a link represents the virtual memory bounds of a mapping, i.e. +vm_area_struct::vm_start-vm_area_struct::vm_end. + +The main purpose of the map_files is to retrieve a set of memory mapped +files in a fast way instead of parsing /proc//maps or +/proc//smaps, both of which contain many more records. At the same +time one can open(2) mappings from the listings of two processes and +comparing their inode numbers to figure out which anonymous memory areas +are actually shared. ------------------------------------------------------------------------------ Configuring procfs