Last edit: <20020111.1200.49> This file has information specific to the i386 kgdb option. Other platforms with the kgdb option may behave in a similar fashion. New features: ============ 20011220.0050.35 Major enhancement with this version is the ability to hold one or more cpus in an SMP system while allowing the others to continue. Also, by default only the current cpu is enabled on single step commands (please note that gdb issues single step commands at times other than when you use the si command). Another change is to collect some useful information in a global structure called "kgdb_info". You should be able to just: p kgdb_info although I have seen cases where the first time this is done gdb just prints the first member but prints the whole structure if you then enter CR (carriage return or enter). This also works: p *&kgdb_info Here is a sample: (gdb) p kgdb_info $4 = {called_from = 0xc010732c, entry_tsc = 32804123790856, errcode = 0, vector = 3, print_debug_info = 0} "Called_from" is the return address from the current entry into kgdb. Sometimes it is useful to know why you are in kgdb, for example, was it an NMI or a real break point? The simple way to interrogate this return address is: l *0xc010732c which will print the surrounding few lines of source code. "Entry_tsc" is the cpu TSC on entry to kgdb (useful to compare to the kgdb_ts entries). "errcode" and "vector" are other entry parameters which may be helpful on some traps. "print_debug_info" is the internal debugging kgdb print enable flag. Yes, you can modify it. In SMP systems kgdb_info also includes the "cpus_waiting" structure and "hold_on_step": (gdb) p kgdb_info $7 = {called_from = 0xc0112739, entry_tsc = 1034936624074, errcode = 0, vector = 2, print_debug_info = 0, hold_on_sstep = 1, cpus_waiting = {{ task = 0x0, hold = 0, regs = 0x0}, {task = 0xc71b8000, hold = 0, regs = 0xc71b9f70}, {task = 0x0, hold = 0, regs = 0x0}, {task = 0x0, hold = 0, regs = 0x0}, {task = 0x0, hold = 0, regs = 0x0}, {task = 0x0, hold = 0, regs = 0x0}, {task = 0x0, hold = 0, regs = 0x0}, {task = 0x0, hold = 0, regs = 0x0}}} "Cpus_waiting" has an entry for each cpu other than the current one that has been stopped. Each entry contains the task_struct address for that cpu, the address of the regs for that task and a hold flag. All these have the proper typing so that, for example: p *kgdb_info.cpus_waiting[1].regs will print the registers for cpu 1. "Hold_on_sstep" is a new feature with this version and comes up set or true. What is means is that whenever kgdb is asked to single step all other cpus are held (i.e. not allowed to execute). The flag applies to all but the current cpu and, again, can be changed: p kgdb_info.hold_on_sstep=0 restores the old behavior of letting all cpus run during single stepping. Likewise, each cpu has a "hold" flag, which if set, locks that cpu out of execution. Note that this has some risk in cases where the cpus need to communicate with each other. If kgdb finds no cpu available on exit, it will push a message thru gdb and stay in kgdb. Note that it is legal to hold the current cpu as long as at least one cpu can execute. 20010621.1117.09 This version implements an event queue. Events are signaled by calling a function in the kgdb stub and may be examined from gdb. See EVENTS below for details. This version also tighten up the interrupt and SMP handling to not allow interrupts on the way to kgdb from a breakpoint trap. It is fine to allow these interrupts for user code, but not system debugging. Version ======= This version of the kgdb package was developed and tested on kernel version 2.4.16. It will not install on any earlier kernels. It is possible that it will continue to work on later versions of 2.4 and then versions of 2.5 (I hope). Debugging Setup =============== Designate one machine as the "development" machine. This is the machine on which you run your compiles and which has your source code for the kernel. Designate a second machine as the "target" machine. This is the machine that will run your experimental kernel. The two machines will be connected together via a serial line out one or the other of the COM ports of the PC. You will need a modem eliminator and the appropriate cables. Decide on which tty port you want the machines to communicate, then cable them up back-to-back using the null modem. COM1 is /dev/ttyS0 and COM2 is /dev/ttyS1. You should test this connection with the two machines prior to trying to debug a kernel. Once you have it working, on the TARGET machine, enter: setserial /dev/ttyS0 (or what ever tty you are using) and record the port and the irq addresses. On the DEVELOPMENT machine you need to apply the patch for the kgdb hooks. You have probably already done that if you are reading this file. On your DEVELOPMENT machine, go to your kernel source directory and do "make Xconfig" where X is one of "x", "menu", or "". If you are configuring in the standard serial driver, it must not be a module. Either yes or no is ok, but making the serial driver a module means it will initialize after kgdb has set up the UART interrupt code and may cause a failure of the control C option discussed below. The configure question for the serial driver is under the "Character devices" heading and is: "Standard/generic (8250/16550 and compatible UARTs) serial support" Go down to the kernel debugging menu item and open it up. Enable the kernel kgdb stub code by selecting that item. You can also choose to turn on the "-ggdb -O1" compile options. The -ggdb causes the compiler to put more debug info (like local symbols) in the object file. On the i386 -g and -ggdb are the same so this option just reduces to "O1". The -O1 reduces the optimization level. This may be helpful in some cases, be aware, however, that this may also mask the problem you are looking for. The baud rate. Default is 115200. What ever you choose be sure that the host machine is set to the same speed. I recommend the default. The port. This is the I/O address of the serial UART that you should have gotten using setserial as described above. The standard com1 port (3f8) using irq 4 is default . Com2 is 2f8 which by convention uses irq 3. The port irq (see above). Stack overflow test. This option makes a minor change in the trap, system call and interrupt code to detect stack overflow and transfer control to kgdb if it happens. (Some platforms have this in the base line code, but the i386 does not.) You can also configure the system to recognize the boot option "console=kgdb" which if given will cause all console output during booting to be put thru gdb as well as other consoles. This option requires that gdb and kgdb be connected prior to sending console output so, if they are not, a breakpoint is executed to force the connection. This will happen before any kernel output (it is going thru gdb, right), and will stall the boot until the connection is made. You can also configure in a patch to SysRq to enable the kGdb SysRq. This request generates a breakpoint. Since the serial port irq line is set up after any serial drivers, it is possible that this command will work when the control C will not. Save and exit the Xconfig program. Then do "make clean" , "make dep" and "make bzImage" (or whatever target you want to make). This gets the kernel compiled with the "-g" option set -- necessary for debugging. You have just built the kernel on your DEVELOPMENT machine that you intend to run on your TARGET machine. To install this new kernel, use the following installation procedure. Remember, you are on the DEVELOPMENT machine patching the kernel source for the kernel that you intend to run on the TARGET machine. Copy this kernel to your target machine using your usual procedures. I usually arrange to copy development: /usr/src/linux/arch/i386/boot/bzImage to /vmlinuz on the TARGET machine via a LAN based NFS access. That is, I run the cp command on the target and copy from the development machine via the LAN. Run Lilo (see "man lilo" for details on how to set this up) on the new kernel on the target machine so that it will boot! Then boot the kernel on the target machine. On the DEVELOPMENT machine, create a file called .gdbinit in the directory /usr/src/linux. An example .gdbinit file looks like this: shell echo -e "\003" >/dev/ttyS0 set remotebaud 38400 (or what ever speed you have chosen) target remote /dev/ttyS0 Change the "echo" and "target" definition so that it specifies the tty port that you intend to use. Change the "remotebaud" definition to match the data rate that you are going to use for the com line. You are now ready to try it out. Boot your target machine with "kgdb" in the boot command i.e. something like: lilo> test kgdb or if you also want console output thru gdb: lilo> test kgdb console=kgdb You should see the lilo message saying it has loaded the kernel and then all output stops. The kgdb stub is trying to connect with gdb. Start gdb something like this: On your DEVELOPMENT machine, cd /usr/src/linux and enter "gdb vmlinux". When gdb gets the symbols loaded it will read your .gdbinit file and, if everything is working correctly, you should see gdb print out a few lines indicating that a breakpoint has been taken. It will actually show a line of code in the target kernel inside the kgdb activation code. The gdb interaction should look something like this: linux-dev:/usr/src/linux# gdb vmlinux GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.15.1 (i486-slackware-linux), Copyright 1995 Free Software Foundation, Inc... breakpoint () at i386-stub.c:750 750 } (gdb) You can now use whatever gdb commands you like to set breakpoints. Enter "continue" to start your target machine executing again. At this point the target system will run at full speed until it encounters your breakpoint or gets a segment violation in the kernel, or whatever. If you have the kgdb console enabled when you continue, gdb will print out all the console messages. The above example caused a breakpoint relatively early in the boot process. For the i386 kgdb it is possible to code a break instruction as the first C-language point in init/main.c, i.e. as the first instruction in start_kernel(). This could be done as follows: #include breakpoint(); This breakpoint() is really a function that sets up the breakpoint and single-step hardware trap cells and then executes a breakpoint. Any early hard coded breakpoint will need to use this function. Once the trap cells are set up they need not be set again, but doing it again does not hurt anything, so you don't need to be concerned about which breakpoint is hit first. Once the trap cells are set up (and the kernel sets them up in due course even if breakpoint() is never called) the macro: BREAKPOINT; will generate an inline breakpoint. This may be more useful as it stops the processor at the instruction instead of in a function a step removed from the location of interest. In either case must be included to define both breakpoint() and BREAKPOINT. Triggering kgdbstub at other times ================================== Often you don't need to enter the debugger until much later in the boot or even after the machine has been running for some time. Once the kernel is booted and interrupts are on, you can force the system to enter the debugger by sending a control C to the debug port. This is what the first line of the recommended .gdbinit file does. This allows you to start gdb any time after the system is up as well as when the system is already at a break point. (In the case where the system is already at a break point the control C is not needed, however, it will be ignored by the target so no harm is done. Also note the the echo command assumes that the port speed is already set. This will be true once gdb has connected, but it is best to set the port speed before you run gdb.) Another simple way to do this is to put the following file in you ~/bin directory: #!/bin/bash echo -e "\003" > /dev/ttyS0 Here, the ttyS0 should be replaced with what ever port you are using. The "\003" is control-C. Once you are connected with gdb, you can enter control-C at the command prompt. An alternative way to get control to the debugger is to enable the kGdb SysRq command. Then you would enter Alt-SysRq-g (all three keys at the same time, but push them down in the order given). To refresh your memory of the available SysRq commands try Alt-SysRq-=. Actually any undefined command could replace the "=", but I like to KNOW that what I am pushing will never be defined. Debugging hints =============== You can break into the target machine at any time from the development machine by typing ^C (see above paragraph). If the target machine has interrupts enabled this will stop it in the kernel and enter the debugger. There is unfortunately no way of breaking into the kernel if it is in a loop with interrupts disabled, so if this happens to you then you need to place exploratory breakpoints or printk's into the kernel to find out where it is looping. The exploratory breakpoints can be entered either thru gdb or hard coded into the source. This is very handy if you do something like: if () BREAKPOINT; There is a copy of an e-mail in the Documentation/i386/kgdb/ directory (debug-nmi.txt) which describes how to create an NMI on an ISA bus machine using a paper clip. I have a sophisticated version of this made by wiring a push button switch into a PC104/ISA bus adapter card. The adapter card nicely furnishes wire wrap pins for all the ISA bus signals. When you are done debugging the kernel on the target machine it is a good idea to leave it in a running state. This makes reboots faster, bypassing the fsck. So do a gdb "continue" as the last gdb command if this is possible. To terminate gdb itself on the development machine and leave the target machine running, first clear all breakpoints and continue, then type ^Z to suspend gdb and then kill it with "kill %1" or something similar. If gdbstub Does Not Work ======================== If it doesn't work, you will have to troubleshoot it. Do the easy things first like double checking your cabling and data rates. You might try some non-kernel based programs to see if the back-to-back connection works properly. Just something simple like cat /etc/hosts >/dev/ttyS0 on one machine and cat /dev/ttyS0 on the other will tell you if you can send data from one machine to the other. Make sure it works in both directions. There is no point in tearing out your hair in the kernel if the line doesn't work. All of the real action takes place in the file /usr/src/linux/arch/i386/kernel/kgdb_stub.c. That is the code on the target machine that interacts with gdb on the development machine. In gdb you can turn on a debug switch with the following command: set remotedebug This will print out the protocol messages that gdb is exchanging with the target machine. Another place to look is /usr/src/arch/i386/lib/kgdb_serial.c This is the code that talks to the serial port on the target side. There might be a problem there. In particular there is a section of this code that tests the UART which will tell you what UART you have if you define "PRNT" (just remove "_off" from the #define PRNT_off). To view this report you will need to boot the system without any beakpoints. This allows the kernel to run to the point where it calls kgdb to set up interrupts. At this time kgdb will test the UART and print out the type it finds. (You need to wait so that the printks are actually being printed. Early in the boot they are cached, waiting for the console to be enabled. Also, if kgdb is entered thru a breakpoint it is possible to cause a dead lock by calling printk when the console is locked. The stub, thus avoids doing printks from break points especially in the serial code.) At this time, if the UART fails to do the expected thing, kgdb will print out (using printk) information on what failed. (These messages will be buried in all the other boot up messages. Look for lines that start with "gdb_hook_interrupt:". You may want to use dmesg once the system is up to view the log. If this fails or if you still don't connect, review your answers for the port address. Use: setserial /dev/ttyS0 to get the current port and irq information. This command will also tell you what the system found for the UART type. The stub recognizes the following UART types: 16450, 16550, and 16550A If you are really desperate you can use printk debugging in the kgdbstub code in the target kernel until you get it working. In particular, there is a global variable in /usr/src/linux/arch/i386/kernel/kgdb_stub.c named "remote_debug". Compile your kernel with this set to 1, rather than 0 and the debug stub will print out lots of stuff as it does what it does. Likewise there are debug printks in the kgdb_serial.c code that can be turned on with simple changes in the macro defines. Debugging Loadable Modules ========================== This technique comes courtesy of Edouard Parmelan When you run gdb, enter the command source gdbinit-modules This will read in a file of gdb macros that was installed in your kernel source directory when kgdb was installed. This file implements the following commands: mod-list Lists the loaded modules in the form mod-print-symbols Prints all the symbols in the indicated module. mod-add-symbols Loads the symbols from the object file and associates them with the indicated module. After you have loaded the module that you want to debug, use the command mod-list to find the of your module. Then use that address in the mod-add-symbols command to load your module's symbols. From that point onward you can debug your module as if it were a part of the kernel. The file gdbinit-modules also contains a command named mod-add-lis as an example of how to construct a command of your own to load your favorite module. The idea is to "can" the pathname of the module in the command so you don't have to type so much. Threads ======= Each process in a target machine is seen as a gdb thread. gdb thread related commands (info threads, thread n) can be used. ia-32 hardware breakpoints ========================== kgdb stub contains support for hardware breakpoints using debugging features of ia-32(x86) processors. These breakpoints do not need code modification. They use debugging registers. 4 hardware breakpoints are available in ia-32 processors. Each hardware breakpoint can be of one of the following three types. 1. Execution breakpoint - An Execution breakpoint is triggered when code at the breakpoint address is executed. As limited number of hardware breakpoints are available, it is advisable to use software breakpoints ( break command ) instead of execution hardware breakpoints, unless modification of code is to be avoided. 2. Write breakpoint - A write breakpoint is triggered when memory location at the breakpoint address is written. A write or can be placed for data of variable length. Length of a write breakpoint indicates length of the datatype to be watched. Length is 1 for 1 byte data , 2 for 2 byte data, 3 for 4 byte data. 3. Access breakpoint - An access breakpoint is triggered when memory location at the breakpoint address is either read or written. Access breakpoints also have lengths similar to write breakpoints. IO breakpoints in ia-32 are not supported. Since gdb stub at present does not use the protocol used by gdb for hardware breakpoints, hardware breakpoints are accessed through gdb macros. gdb macros for hardware breakpoints are described below. hwebrk - Places an execution breakpoint hwebrk breakpointno address hwwbrk - Places a write breakpoint hwwbrk breakpointno length address hwabrk - Places an access breakpoint hwabrk breakpointno length address hwrmbrk - Removes a breakpoint hwrmbrk breakpointno exinfo - Tells whether a software or hardware breakpoint has occurred. Prints number of the hardware breakpoint if a hardware breakpoint has occurred. Arguments required by these commands are as follows breakpointno - 0 to 3 length - 1 to 3 address - Memory location in hex digits ( without 0x ) e.g c015e9bc SMP support ========== When a breakpoint occurs or user issues a break ( Ctrl + C ) to gdb client, all the processors are forced to enter the debugger. Current thread corresponds to the thread running on the processor where breakpoint occurred. Threads running on other processor(s) appear similar to other non running threads in the 'info threads' output. With in the kgdb stub there is a structure "waiting_cpus" in which kgdb records the values of "current" and "regs" for each cpu other than the one that hit the breakpoint. "current" is a pointer to the task structure for the task that cpu is running, while "regs" points to the saved registers for the task. This structure can be examined with the gdb "p" command. ia-32 hardware debugging registers on all processors are set to same values. Hence any hardware breakpoints may occur on any processor. gdb troubleshooting =================== 1. gdb hangs Kill it. restart gdb. Connect to target machine. 2. gdb cannot connect to target machine (after killing a gdb and restarting another) If the target machine was not inside debugger when you killed gdb, gdb cannot connect because the target machine won't respond. In this case echo "Ctrl+C"(ASCII 3) in the serial line. e.g. echo -e "\003" > /dev/ttyS1 This forces that target machine into debugger after which you can connect. 3. gdb cannot connect even after echoing Ctrl+C into serial line Try changing serial line settings min to 1 and time to 0 e.g. stty min 1 time 0 < /dev/ttyS1 Try echoing again check serial line speed and set it to correct value if required e.g. stty ispeed 115200 ospeed 115200 < /dev/ttyS1 EVENTS ====== Ever want to know the order of things happening? Which cpu did what and when? How did the spinlock get the way it is? Then events are for you. Events are defined by calls to an event collection interface and saved for later examination. In this case, kgdb events are saved by a very fast bit of code in kgdb which is fully SMP and interrupt protected and they are examined by using gdb to display them. Kgdb keeps only the last N events, where N must be a power of two and is defined at configure time. Events are signaled to kgdb by calling: kgdb_ts(data0,data1) For each call kgdb records each call in an array along with other info. Here is the array def: struct kgdb_and_then_struct { #ifdef CONFIG_SMP int on_cpu; #endif long long at_time; int from_ln; char * in_src; void *from; int with_if; int data0; int data1; }; For SMP machines the cpu is recorded, for all machines the TSC is recorded (gets a time stamp) as well as the line number and source file the call was made from. The address of the (from), the "if" (interrupt flag) and the two data items are also recorded. The macro kgdb_ts casts the types to int, so you can put any 32-bit values here. There is a configure option to select the number of events you want to keep. A nice number might be 128, but you can keep up to 1024 if you want. The number must be a power of two. An "andthen" macro library is provided for gdb to help you look at these events. It is also possible to define a different structure for the event storage and cast the data to this structure. For example the following structure is defined in kgdb: struct kgdb_and_then_struct2 { #ifdef CONFIG_SMP int on_cpu; #endif long long at_time; int from_ln; char * in_src; void *from; int with_if; struct task_struct *t1; struct task_struct *t2; }; If you use this for display, the data elements will be displayed as pointers to task_struct entries. You may want to define your own structure to use in casting. You should only change the last two items and you must keep the structure size the same. Kgdb will handle these as 32-bit ints, but within that constraint you can define a structure to cast to any 32-bit quantity. This need only be available to gdb and is only used for casting in the display code. Final Items =========== I picked up this code from Amit S. Kale and enhanced it. If you make some really cool modification to this stuff, or if you fix a bug, please let me know. George Anzinger Amit S. Kale (First kgdb by David Grothe ) (modified by Tigran Aivazian ) Putting gdbstub into the kernel config menu. (modified by Scott Foehner ) Hooks for entering gdbstub at boot time. (modified by Amit S. Kale ) Threads, ia-32 hw debugging, mp support, console support, nmi watchdog handling. (modified by George Anzinger ) Extended threads to include the idle threads. Enhancements to allow breakpoint() at first C code. Use of module_init() and __setup() to automate the configure. Enhanced the cpu "collection" code to work in early bring up.