GDB is killing my inferior process -
gdb killing inferior. inferior long-running (20-30 minutes) benchmark. gdb , inferior both running under uid. runs fine while signal handler called siginfo_t instance si_signo = 11, si_errno = 0 , si_code = 0; _sifields._kill.si_pid = (gdb-pid), _sifields._kill.si_uid = (my-uid).
i read gdb decided send kill signal inferior process. under circumstances gdb this?
this not sigsegv (even though si_signo suggest is) since si_code 0 , si_pid , si_uid set). inferior multi-threaded c++ application custom signal handler handle gpfs when application hits memory barrier set protect ranges of memory. when run under gdb set
handle sigsegv noprint
to ensure gdb passes sigsegv signals relating memory barrier on application handling. part seems working fine -- sigsegv nonzero si_code in siginfo_t struct handled (after verifying faulting address in siginfo->_sifields.si_addr within protected range of memory).
but sigsegv 0 si_code indicates inferior being killed, far can tell, , _sifields._kill fields, overlays _sifields._sigfault fields, support interpretation: gdb killing inferior process.
i don't understand causes gdb this.
an update on this: looks gdb sending sigstop inferior. if @ $_siginfo @ point of failure see:
(gdb) p $_siginfo $2 = { si_signo = 5, si_errno = 0, si_code = 128, _sifields = { _pad = {0, 0, -1054653696, 57, 97635496, 0, 5344160, 0, 47838328, 0, -154686444, 32767, 47838328, 0, 4514687, 0, 0, 0, 49642032, 0, 50016832, 0, 49599376, 1, 0, 0, 92410096, 0}, _kill = { si_pid = 0, si_uid = 0 }, _timer = { si_tid = 0, si_overrun = 0, si_sigval = { sival_int = -1054653696, sival_ptr = 0x39c1234300 } }, _rt = { si_pid = 0, si_uid = 0, si_sigval = { sival_int = -1054653696, sival_ptr = 0x39c1234300 } }, _sigchld = { si_pid = 0, si_uid = 0, si_status = -1054653696, si_utime = 419341262248738873, si_stime = 22952992424591360 }, _sigfault = { si_addr = 0x0 }, _sigpoll = { si_band = 0, si_fd = -1054653696 } } }
but signal handler sees (somewhat obfuscated * -- working in clean-room environment):
(gdb) bt #0 ***signalhandler (signal=11, siginfo=0x7fff280083f0, contextinfo=0x7fff280082c0) @ ***signal.c:*** ... (gdb) setsig 0x7fff280083f0 [signo=11; code=0; addr=0xbb900007022] ((siginfo_t*) 0x7fff280083f0) ... (gdb) p *((siginfo_t*) 0x7fff280083f0) $4 = { si_signo = 11, si_errno = 0, si_code = 0, _sifields = { _pad = {28706, 3001, -515511096, 32767, -233916640, 32767, -228999566, 32767, 671122824, 32767, -468452105, 1927272, 1, 0, -515510808, 32767, 0, 32767, 37011703, 0, -515511024, 32767, 37011703, 32767, 2, 32767, 1000000000, 0}, _kill = { si_pid = 28706, si_uid = 3001 }, _timer = { si_tid = 28706, si_overrun = 3001, si_sigval = { sival_int = -515511096, sival_ptr = 0x7fffe145ecc8 } }, _rt = { si_pid = 28706, si_uid = 3001, si_sigval = { sival_int = -515511096, sival_ptr = 0x7fffe145ecc8 } }, _sigchld = { si_pid = 28706, si_uid = 3001, si_status = -515511096, si_utime = 140737254438688, si_stime = 140737259355762 }, _sigfault = { si_addr = 0xbb900007022 }, _sigpoll = { si_band = 12889196884002, si_fd = -515511096 } } } (gdb) shell ps -ef | grep gdb *** 28706 28704 0 jun26 pts/17 00:00:02 /usr/bin/gdb -q *** (gdb) shell echo $uid 3001
so signal handler sees siginfo_t struct si_signo 11 (sigsegv), si_code = 0 (kill), si_pid = 28706 (gdb), , si_user = 3001 (me). , gdb reports siginfo_t si_signo = 5 (sigstop).
it may inferior process performing low-level handling of original sigstop , sending chain kill. original sigstop don't understand/want eliminate.
i should add setting following directives before starting inferior (and makes no difference whether handle sigstop directive set or not):
handle sigsegv noprint handle sigstop nostop print ignore
does shed light on problem? killing me. also, if no insight here, can suggest other forums might helpful post to?
(gdb) show version gnu gdb (gdb) red hat enterprise linux (7.1-29.el6_0.1) copyright (c) 2010 free software foundation, inc.
i running on 1.8ghz 16 core/32 thread xeon, 4x e7520, nehalem-based server. same result regardless of whether hyperthreading enabled or disabled.
under linux, si_signo = 11 indicate gdb propagating sigsegv. see signal(7) signal numbers.
try:
(gdb) handle sigsegv nopass signal stop print pass program description sigsegv yes yes no segmentation fault
try casting third argument of signal handler function register sigaction() (ucontext *) , dumping cpu registers. instruction pointer in particular provide clue:
#include <ucontext.h> int my_sigsegv_handler(int signo, siginfo_t *info, void *context) { ucontext *u = (ucontext *)context; /* dump u->uc_mtext.gregs[reg_rip] o reg_eip */ }
then pass instruction pointer info addr in gdb.
to understand what's happening, i'd try pin down:
- exactly signal seen process, sigsegv indicated si_signo member of siginfo_t? what's first argument of signal handler function registered sigaction()? (those 2 things not matching unlikely not impossible ptrace_setsiginfo)
- gdb either intercepted signal kernel sending process , injected signal again or decided send signal itself. try determine which. done running gdb under , breaking on kill, tkill, tgkill , ptrace if $rdi == ptrace_kill (sounds time consuming, know).
Comments
Post a Comment