Using gdb, ignore beginning segmentation fault until reproduce environment segmentation fault

pooyair · October 12, 2012, 12:57pm

I use a binary name (ie polo) it gets some parameter , so for debugging normally i do this :

i wrote script for watchdog my app (polo) and check every second if it's not running then start it , the problem is , if my app , remain in state of segmentation fault for a while (ie 15 sec) , Watchdog would frequently and immediately , start the app (ie in 15 sec more than 10 times , it would start my app , and then again segmentation fault , again the watchdog script make my app start and son on ...) until i get crash in my environment which my stb get black screen and i have to restart my stb.
i already had fix that segmentation fault , but for next step , i also want to fix , the bug which cause my environment crash (since if in other part of my app , there was a segmentation fault , it shouldn't make my env crash which i have to restart every-time)
my question is , how could i debug , a application with watchdog script which is controlling my app , i mean , i could let that segmentation fault bug exists (for reproducing the env Crash (which i need to investigate) but the problem is before env would be crash , the segment-fault , happened , so it prevents my app to reach environment crash ,Would highly appreciated any idea.

DGPickett · October 12, 2012, 3:05pm

Watchdog scripts should move the core file before restarting. I like compressing it into a file with the date-time in another directory, perhaps /tmp so they get cleaned up if they get too large.

Beyond that, I like to scan for core on prod and dev boxes, ls -l it, copy it to /tmp/core.YYYY-MMDD-HHMMSS so it is not overwritten, automatically locate the main code using 'file' and common PATHs, run gdb for a stack trace (where), sending an email to the group with as much info as I could get, so they know one side effect of their activities is this core dump, which might be missed otherwise. Then I compress it in the background and sleep a second to ensure unique naming. A marker file keeps track of my last scan time, so I do not pick up the same files over and over.

But then, I am more into diagnosis by post mortem than running in debug mode. I am not sure what the environment has to do the the SEGV, that is usually a programmer with too much trust of his inputs.

jim_mcnamara · October 12, 2012, 3:07pm

Here is what you want to do

compile polo with the cc option -g
run it however you want in your home directory.
When segmentation fault happens - STOP at this point.

gdb polo  core
gdb> ba

ba is the backtrace command in gdb.
This will show the line of your code that has a problem. The first line number in polo code after the ba command is the line with the problem.

e.g.

ba 
 ..........
 .......... ignore these lines
 strcpy in myfunction()  polo. c.:42

DGPickett · October 12, 2012, 4:22pm

Yes, -g and no strip makes it easier to unscrew.

pooyair · October 12, 2012, 4:58pm

Thanks for replies

As explained in first post of thread , using normal gdb backtrace comand could not helpful , since i know Segmentation fault log , my question is not how i could catch the segmentation fault error log or fix it , since i know how.
my question is , If segmentation fault error occur more than 12 times , my environment (or maybe my kernel would crash) it could be kernel bug with my stb , Therefore , using -g and catch the first SEGV is not my desire (i know even the fix for this SEGV) but i deliberately , let it to be , in order to investigate Kernel (or environment) crash

if i use -g option and use gdb , before i could achieve to investigate Kernel (or environment) crash , i would get Segmentation fault (which is not as my desire , since it prevent me to know , why kernel would crash...

i compiled polo , with the cc option : -ggdb
could somebody please tell me what is the difference beetween -g and -ggdb in cc option?

jim_mcnamara · October 13, 2012, 12:18am

You cannot debug kernel code from gdb. You have to use a kernel debugger.

I would suggest that you are probably corrupting the kernel with repeated segfaults.

You probably are not aware, but originally UNIX would panic (crash with a core dump) when any process had a segfault. UNIX is not meant to have bad code violate memory time after time after time. That said, I kinda doubt it is a bug in the true sense of the word.

Your code is acting more like a virus.

What does the system log say about errors? Do you get a core when the system crashes? You probably did get a system dump. You can analyze that system core.

What OS do you have?

You do understand that what I am about to say will let you do what you ask but it may trash your OS eventually:
Block (ignore) the SIGSEGV signal and set up the signal handler to reset the ignore. Then let your code run over and over the bad code until the system dies. Be sure to turn on a full system dump. Some OSes let you turn off system core dumps. You want it on. That may take a lot of GB of disk space. Then go after your bug in the kernel with the correct tool.

I do not know your OS so I cannot give you a better answer.

pooyair · October 13, 2012, 4:17am

@Jim
The os is Stlinux (Cpu Sh4) . Welcome! | STLinux and it run with busybox.
2.6.23.17
in this article it says , about kernel debugging with stlinux :
Debugging the kernel | STLinux

but i am not sure , if my stb's kernel debugging is enable (i mean , the stb's kernel had compiled with the feature of kernel debugging) but this is not prevent me , since i had installed Stlinux in my pc , if i could achieve to enable System core dump of my stb , maybe i could debug it in my pc , how could i get system core dump with my stlinux's stb? if it would be any link , i appreciate and do it as link instruction...
Does Tail command could be helpful for my case?

P.S: for debugging purposes , i had run my STB , via NFS with the log (shown in Terminal)