Martin Vidner

Showing posts with label debugging. Show all posts

Jan 11, 2019

Backtrace from a core dump with GDB: pitfalls

At work I mostly program in Ruby, but our system (YaST) has a C/C++ part
too. So I get to debug C crashes infrequently enough to forget some
tricks. Here I write them down for the next time.

Contents and tl;dr

$ ulimit -c unlimited so that a core dump is saved
$ sudo coredumpctl dump -o core so that it is saved to a file
$ gdb -c core is wrong, gdb program core is right
(gdb) info sh mylib
(gdb) select a stack frame before querying variables or macros
$ CFLAGS="-g3 ..." for debugging macros

(Un)helpful crash dump

When Ruby crashes, it produces a very verbose dump of information. It is not
much helpful in our case:

$ rake run[fail]
/sbin/yast2 src/clients/fail.rb
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204: [BUG] Segmentation fault at 0x0000000000000008
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux-gnu]

-- Control frame information -----------------------------------------------
c:0011 p:---- s:0066 e:000065 CFUNC  :call_yast_function
c:0010 p:0061 s:0058 e:000057 BLOCK  /usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204 [FINISH]
c:0009 p:0023 s:0054 e:000053 METHOD /local/home-mvidner/svn/yast/samba-server/src/clients/fail.rb:5
[...]
c:0002 p:0489 s:0011 E:000c18 EVAL   /usr/lib/YaST2/bin/y2start:58 [FINISH]
c:0001 p:0000 s:0003 E:001300 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
/usr/lib/YaST2/bin/y2start:58:in `<main>'
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/wfm.rb:195:in `CallFunction'
[...]
/usr/lib64/ruby/vendor_ruby/2.5.0/yast/yast.rb:204:in `call_yast_function'

-- Machine register context ------------------------------------------------
 RIP: 0x00007fa423bf9677 RBP: 0x00007fff30f5ff20 RSP: 0x00007fff30f5fe10
[...]
 R14: 0x00007fff30f5ff48 R15: 0x00007fff30f5fe10 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
/usr/lib64/libruby2.5.so.2.5(rb_print_backtrace+0x15) [0x7fa428543f15]
/usr/lib64/libruby2.5.so.2.5(0x1c814c) [0x7fa42854414c]
/usr/lib64/libruby2.5.so.2.5(0x93e44) [0x7fa42840fe44]
/usr/lib64/libruby2.5.so.2.5(0x158eb2) [0x7fa4284d4eb2]
/lib64/libc.so.6(0x7fa427ff8160) [0x7fa427ff8160]
/usr/lib64/YaST2/plugin/libpy2lang0_ruby.so(_ZN5YRuby9callInnerENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_7YCPList8constPtrI4TypeS8_E+0x4c7) [0x7fa423bf9677]
[...]
/usr/bin/ruby.ruby2.5(_start+0x2a) [0x40086a]

-- Other runtime information -----------------------------------------------

* Loaded script: /usr/lib/YaST2/bin/y2start

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /usr/lib64/ruby/2.5.0/x86_64-linux-gnu/enc/encdb.so
[...]
   87 /usr/lib64/ruby/vendor_ruby/2.5.0/yast/y2start_helpers.rb
   88 /local/home-mvidner/svn/yast/samba-server/src/modules/Fail.rb

* Process memory map:

00400000-00401000 r-xp 00000000 08:01 7766                               /usr/bin/ruby.ruby2.5
[...]
7fa42363b000-7fa42365c000 r--s 00000000 08:01 6998                       /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
[...]
7fa428a5f000-7fa428a60000 rw-p 00026000 08:01 1310795                    /lib64/ld-2.26.so
7fa428a60000-7fa428a61000 rw-p 00000000 00:00 0 
7fff30767000-7fff30f66000 rw-p 00000000 00:00 0                          [stack]
7fff30ff1000-7fff30ff4000 r--p 00000000 00:00 0                          [vvar]
7fff30ff4000-7fff30ff6000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html

/sbin/yast2: line 455: 25102 Aborted                 (core dumped) $ybindir/y2start $module "$@" "$SELECTED_GUI" $Y2_GEOMETRY $Y2UI_ARGS

"(core dumped)" is lying

It means to say: Per /usr/lib/sysctl.d/50-coredump.conf the core has been sent
to /usr/lib/systemd/systemd-coredump which obeyed "ulimit -c 0" and did not
save it.
So let's

use the ulimit -c unlimited command (bash shell builtin, actually)
Let it crash again
sudo coredumpctl dump -o core

Also note that without the dump subcommand, the core dump will be deleted
after a day. (Why?)

(To avoid systemd-coredump, you can use
sudo /usr/sbin/sysctl kernel.core_pattern="core.%p"; thanks lslezak)

gdb -c core is not useful without the executable

When we do that, the backtrace is very unhelpful:

(gdb) bt
#0  0x00007eff7d0140e0 in ?? ()
#1  0x0000000000000400 in ?? ()
#2  0xffffffffffffffff in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0xffffffffffffffff in ?? ()
#5  0xffffffffffffffff in ?? ()
#6  0xffffffffffffffff in ?? ()
#7  0xffffffffffffffff in ?? ()
#8  0xffffffffffffffff in ?? ()
#9  0xffffffffffffffff in ?? ()
#10 0xffffffffffffffff in ?? ()
#11 0xffffffffffffffff in ?? ()
#12 0xffffffffffffffff in ?? ()
#13 0xffffffffffffffff in ?? ()
#14 0x0000000000000000 in ?? ()

For some reason, gdb will not load the executable file even though it is named
in the core file. Do it by hand:

gdb program core

$ gdb /usr/bin/ruby.ruby2.5 core

[... lots of text that we'll get back to in the next section ...]
(gdb) bt
#0  0x00007eff7d0140e0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007eff7d01579f in __GI_abort () at abort.c:100
#2  0x00007eff7d42be51 in die () at error.c:552
#3  0x00007eff7d42be51 in rb_bug_context (ctx=ctx@entry=0xcebac0, fmt=fmt@entry=0x7eff7d5925c4 "Segmentation fault at %p") at error.c:582
#4  0x00007eff7d4f0eb2 in sigsegv (sig=11, info=0xcebbf0, ctx=0xcebac0) at signal.c:928
#5  0x00007eff7d014160 in <signal handler called> () at /lib64/libc.so.6
#6  0x00007eff78c15677 in YRuby::callInner(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, YCPList, constPtr<Type, Type>) () at /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
#7  0x00007eff78c19d80 in Y2RubyFunction::evaluateCall() () at /usr/lib64/YaST2/plugin/libpy2lang_ruby.so

Note that the frames 0 to 4 show the source code locations (file name and line
number) as well as the argument names and values. It's because gdb has found
the debugging information for glibc and the Ruby library:

(gdb) info sh libc
From                To                  Syms Read   Shared Object Library
0x00007eff7cffe770  0x00007eff7d14365c  Yes         /lib64/libc.so.6
0x00007eff7c981cd0  0x00007eff7c9895e8  Yes         /lib64/libcrypt.so.1
0x00007eff73c256f0  0x00007eff73c27007  Yes (*)     /usr/lib64/libcap.so.2
0x00007eff6972ca10  0x00007eff69731b65  Yes         /usr/lib64/qt5/plugins/platforminputcontexts/libcomposeplatforminputcontextplugin.so
(*): Shared library is missing debugging information.
(gdb) info sh ruby
From                To                  Syms Read   Shared Object Library
0x00007eff7d3c73c0  0x00007eff7d566ea8  Yes         /usr/lib64/libruby2.5.so.2.5
0x00007eff7c433850  0x00007eff7c4344c0  Yes (*)     /usr/lib64/ruby/2.5.0/x86_64-linux-gnu/enc/encdb.so
[...]
0x00007eff7b9ec2c0  0x00007eff7b9f4cb0  Yes (*)     /usr/lib64/ruby/vendor_ruby/2.5.0/x86_64-linux-gnu/yastx.so
0x00007eff78c0fc00  0x00007eff78c1e740  Yes (*)     /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
0x00007eff789f82a0  0x00007eff78a001a5  Yes (*)     /usr/lib64/ruby/vendor_ruby/2.5.0/x86_64-linux-gnu/yast/builtinx.so
(*): Shared library is missing debugging information.

info sh is short for info sharedlibrary. If you omit the library name filter, the
length of the list will easily obscure the "(*)" footnote.

Add debuginfo

On openSUSE (and SUSE Linux Enterprise), the debugging symbols are in separate
RPM packages:

zypper in yast2-ruby-bindings-debug{info,source}

Or take the hint from the gdb message that we omitted in the previous section:

Missing separate debuginfo for /usr/lib64/YaST2/plugin/libpy2lang_ruby.so
Try: zypper install -C "debuginfo(build-id)=8c6a1d943255c6f935e029140e75b49b1b7c22fb"

After restarting gdb:

(gdb) bt
[...]
#5  0x00007eff7d014160 in <signal handler called> () at /lib64/libc.so.6
#6  0x00007eff78c15677 in rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
#7  0x00007eff78c15677 in YRuby::callInner(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, YCPList, constPtr<Type, Type>) (this=this@entry=0xfefcc0, module_name=..., function=..., argList=..., wanted_result_type=...) at /usr/src/debug/yast2-ruby-bindings-4.0.6-lp150.1.1.x86_64/src/binary/YRuby.cc:220
[...]

In frame #6, rb_array_len (a=8), is the a argument equal to nil?
Let's check:

(gdb) p Qnil
No symbol "Qnil" in current context.

That's misleading! Although Qnil is defined in a header, and you've been
taught to prefer static const int to macros because of macros being
"global", they are in fact not as much global as for gdb to see them now. We
have to select a frame to have a context of a compilation unit.

(gdb) f 6
#6  rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
2057        return (RBASIC(a)->flags & RARRAY_EMBED_FLAG) ?
(gdb) p Qnil
No symbol "Qnil" in current context.

WTF?! Well, having debugging information loaded is not enough, the debuginfo
must have an elevated level that includes the macros.

Debuginfo with macros

Instead of the -g flag to gcc, I had to specify -g3 in CFLAGS and
rebuild and reinstall. After that, finally:

(gdb) f 6
#6  rb_array_len (a=8) at /usr/include/ruby-2.5.0/ruby/ruby.h:2057
2057        return (RBASIC(a)->flags & RARRAY_EMBED_FLAG) ?
(gdb) p Qnil
$1 = 8
(gdb) f 7
#7  YRuby::callInner (this=this@entry=0xfefcc0, module_name=..., function=..., argList=..., wanted_result_type=...) at /usr/src/debug/yast2-ruby-bindings-4.0.6-lp150.1.1.x86_64/src/binary/YRuby.cc:220
220         VALUE backtrace = RARRAY_LEN(trace)>0 ? rb_ary_entry(trace, 0) : rb_str_new2("Unknown");

So a = 8 and Qnil is 8, so we have confirmed that the trace array being
nil is causing the crash.

(For the record, this is bsc#1119690.)

Nov 16, 2010

dbus-dump

dbus-dump is a tool to capture D-Bus messages in a libpcap capture file.

It takes an idea from dbus-scrape, which processes a strace output of dbus-monitor, and takes it further by stracing dbus-daemon, thus not relying on any eavesdropping (mis)configuration.

The intended purpose is to establish the libpcap capture format as a base for debugging tools like

Thanks to Will Thompson for mentioning the pcap idea.

Usage

$ sudo strace -p `pgrep -f 'dbus-daemon --system'` \
    -s 3000 -ttt -xx -o foo.strace
$ ./dbus-dump foo.strace foo.pcap
$ ./dbus-pcap-parse foo.pcap
Tue Nov 16 12:56:47 +0100 2010 #<DBus::Message:0xb741f340
 @body_length=0,
 @destination="fi.epitest.hostap.WPASupplicant",
 @error_name=nil,
 @flags=0,
 @interface="fi.epitest.hostap.WPASupplicant.Interface",
 @member="scan",
 @message_type=1,
 @params=[],
 @path="/fi/epitest/hostap/WPASupplicant/Interfaces/180",
 @protocol=1,
 @reply_serial=nil,
 @sender=":1.7132",
 @serial=88639,
 @signature="">
Tue Nov 16 12:56:47 +0100 2010 #<DBus::Message:0xb741b060
 @body_length=4,
 @destination=":1.7132",
[...]>

Dependencies

It is written in Ruby. The pcap format is handled by a small bundled module. dbus-dump has no other dependencies. dbus-pcap-parse uses ruby-dbus.

Bugs

This is an early proof-of-concept release, serving to introduce the libpcap format.

The main problem of dbus-dump is duplicating the messages, seeing them both when the daemon receives them and when it sends them (multiple times, for the signals).

The other tools haven't caught up yet:

$ /usr/sbin/tcpdump -r foo.pcap
reading from file foo.pcap, link-type 231
tcpdump: unknown data link type 231

May 2, 2008

D-Bus Spying

I found how to make dbus-monitor show me more than just signals on the system bus. The documentation in dbus-daemon(1) is very terse about that. This is probably insecure, although I don't know of an interesting exploit.

--- /etc/dbus-1/system.conf.orig        2008-05-02 17:08:03.000000000 +0200
+++ /etc/dbus-1/system.conf     2008-05-02 17:08:30.000000000 +0200
@@ -55,6 +55,9 @@
   <allow send_requested_reply="true">
   <allow receive_requested_reply="true">
 </allow>
+  <policy context="mandatory">
+    <allow receive_interface="*" eavesdrop="true">
+  </policy>

 <!-- Config files are placed here that among other things, punch
 holes in the above policy for specific services. -->

Then you need to tell the daemon to reread the config:

# pkill -HUP -u messagebus

BTW I still haven't figured out how to monitor method_return. I guess receive_requested_reply="false" should do it but not sure how. Does anyone know?

Apr 21, 2008

TWM tip

TWM is the ancient window manager that you can use when troubleshooting the problems with X. It has always annoyed me that it can't make up its own mind and you have to place the windows manually when they first appear.

Solution:

cp /usr/share/X11/twm/system.twmrc ~
echo RandomPlacement >> ~/.twmrc

TODO: make it the default in openSUSE.

Apr 18, 2008

openSUSE 11.0 Beta 1: Is that the murderer's name and address in the victim's hand?

As Beta 1 is getting mirrored, I am excited to tell you about obscure parts of the installation that I hope you will never see.

YaST does not crash. Never. Well, only rarely. Especially not during the initial installation. Seldom. So what do we do when it does? We want the backtrace and the debugging logs. So before 11.0 we would ask you to please let it crash again, but with some special options to produce the backtrace and the debugging info. Spoils the fun.

Since Alpha 2 YaST will take care of the backtrace. Since Beta 1, it will also produce tons of the debugging messages, but throw most of them away and only show the last few of them in case of a crash. Reporting a bug has become much easier!

This has actually helped us already in debugging one crash that blocked the release of Beta 1, bug 380283. If you want to see for yourself, search the log for "got signal". The next line says "Liberating suppressed debugging messages" and afterwards there is "Back trace".

References: FATE#302167, FATE#302166.

Mar 26, 2007

YaST Log Summarizer

When dealing with YaST logs, it happens every now and then that you wade through the voluminous output only to discover that you are looking at the wrong run and the bug happened three days after the start of the log file.

So I wrote a simple log summarizer y2logpids, that shows the first and last lines for each process ID in the log. And as a side feature, it also measures time differences for the case where the reporter says "and then it hangs for 30 minutes".

$ y2logpids -h
Usage:
y2logpids -h|--help
y2logpids [-m] [-d [<seconds>]] [log-files]

Options and Arguments:
-m, --maximum
Show the maximum time difference between log lines, within a single
PID.

-d [seconds], --difference[=seconds]
Show where the time difference is larger than seconds (60 if
unspecified), within a single PID

$ wc -l logs/255012-y2log
23501 logs/255012-y2log
$ y2logpids logs/255012-y2log
2007-03-20 11:13:31  e111(4034) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'inst_suse_register' 'qt' ''
2007-03-20 11:13:45  e111(4034) [media] MediaManager.cc(restoreAutoMounter):248 Restored HAL volume handling (automounter)
2007-03-20 11:14:15  e111(4222) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'inst_suse_register' 'qt' ''
2007-03-20 11:29:07  e111(4222) [media] MediaManager.cc(restoreAutoMounter):248 Restored HAL volume handling (automounter)
2007-03-20 11:32:51  e111(5265) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'online_update' 'qt' ''
2007-03-20 11:33:12  e111(5265) [media] MediaManager.cc(restoreAutoMounter):248 Restored HAL volume handling (automounter)
2007-03-20 11:34:07  e111(5310) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'online_update' 'qt' ''
2007-03-20 12:02:02  e111(5310) [YCP] OnlineUpdateCallbacks.ycp:50 PatchProgressCallback 100
2007-03-20 12:02:18  e111(7003) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'print-product' 'qt' ''
2007-03-20 12:03:51  e111(7003) [zypp] ZYppFactory.cc(unLockFile):139 unlocked
2007-03-20 12:03:55  e111(5310) [zypp] ExternalProgram.cc(checkStatus):333 pid 5881 successfully completed
2007-03-20 12:24:22  e111(5310) [media] MediaManager.cc(restoreAutoMounter):248 Restored HAL volume handling (automounter)
2007-03-20 12:35:41  e111(8715) [liby2] genericfrontend.cc(main):200 Launched YaST2 component 'y2base' 'online_update' 'qt' ''
2007-03-20 12:38:51  e111(8715) [media] MediaManager.cc(restoreAutoMounter):248 Restored HAL volume handling (automounter)

Download the script.

Mar 8, 2007

Argument Debugging in Bash

Debugging bash scripts? Need to see what arguments you get? See the
difference between echo "$@" and (the bash builtin) printf "%q".
First we set some tricky arguments and then set them again, pasting
the output of the inspection command.

As you can see, plain echo degrades quickly:

e5:~ # set -- arg1 "arg two" back\\slash "'single'" '"double"'
e5:~ # echo "$@"
arg1 arg two back\slash 'single' "double"
e5:~ # set -- arg1 arg two back\slash 'single' "double"
e5:~ # echo "$@"
arg1 arg two backslash single double

But printf %q keeps working:

e5:~ # set -- arg1 "arg two" back\\slash "'single'" '"double"'
e5:~ # printf "%q " "$@"; echo
arg1 arg\ two back\\slash \'single\' \"double\"
e5:~ # set -- arg1 arg\ two back\\slash \'single\' \"double\"
e5:~ # printf "%q " "$@"; echo
arg1 arg\ two back\\slash \'single\' \"double\"