iDRAC Virtual Console, Linux and segfaults

While experimenting with Arch Linux, it turned out I couldn’t run the iDRAC Virtual Console (firmware version: 1.66.65), no matter which Java version I used. For those unaware, iDRAC Virtual Console is a Java application for out-of-band management of Dell servers. Whenever I tried to run it, all I got was a pop-up window saying “Connecting to Virtual Console server” and then a nasty message on the console:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x82bfa4f0, pid=15893, tid=2219502400
#
# JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)
# Java VM: Java HotSpot(TM) Server VM (25.45-b02 mixed mode linux-x86 )
# Problematic frame:
# C  0x82bfa4f0
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/kempniu/hs_err_pid15893.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

Not very helpful at first glance, but let’s check the hs_err_pid15893.log file:

...
Stack: [0x8445e000,0x844af000],  sp=0x844ace0c,  free space=315k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  0x82bfa4f0
C  [libVMAPI_DLL.so+0xdd7e4]  ValidateX509Certificate(char*, int, int*, char*)+0xe4
C  [libVMAPI_DLL.so+0x994ff]  Java_com_avocent_app_security_X509CertificateJNI_ValidateX509Certificate+0xda
j  com.avocent.app.security.X509CertificateJNI.ValidateX509Certificate([BI[ILjava/lang/String;[C)I+0
j  com.avocent.app.security.X509CertificateJNI.validateX509Certificate([B[ILjava/lang/String;)I+22
j  com.avocent.app.security.OpenSSLTrustManager.checkServerTrusted([Ljava/security/cert/X509Certificate;Ljava/lang/String;)V+828
j  sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted([Ljava/security/cert/X509Certificate;Ljava/lang/String;Ljava/net/Socket;)V+6
j  sun.security.ssl.ClientHandshaker.serverCertificate(Lsun/security/ssl/HandshakeMessage$CertificateMsg;)V+163
j  sun.security.ssl.ClientHandshaker.processMessage(BI)V+237
j  sun.security.ssl.Handshaker.processLoop()V+96
j  sun.security.ssl.Handshaker.process_record(Lsun/security/ssl/InputRecord;Z)V+24
j  sun.security.ssl.SSLSocketImpl.readRecord(Lsun/security/ssl/InputRecord;Z)V+357
j  sun.security.ssl.SSLSocketImpl.performInitialHandshake()V+84
j  sun.security.ssl.SSLSocketImpl.startHandshake(Z)V+13
j  sun.security.ssl.SSLSocketImpl.startHandshake()V+2
j  com.avocent.d.a.a.a(Ljava/net/Socket;I)Ljava/net/Socket;+1334
j  com.avocent.d.a.a.a(BLjava/net/Socket;)Ljava/net/Socket;+79
j  com.avocent.d.a.a.a()Ljava/net/Socket;+3
j  com.avocent.d.c.b.a(Ljava/lang/String;II)V+75
j  com.avocent.a.a.t.g()V+162
j  com.avocent.a.a.t.a(Ljava/lang/String;IILjavax/net/ssl/X509TrustManager;)Lcom/avocent/a/a/i;+53
j  com.avocent.app.c.j.m()V+264
j  com.avocent.app.c.j.d()V+572
j  com.avocent.idrac.kvm.a.d()V+1
j  com.avocent.idrac.kvm.Main.a([Ljava/lang/String;)V+59
j  com.avocent.idrac.kvm.Main.main([Ljava/lang/String;)V+77
...

Okay, this sheds some light on things. We learned that:

  • the error is triggered by native code (inside libVMAPI_DLL.so, which is shipped with the application as you can see in viewer.jnlp), not Java code,
  • the error is triggered while validating an X.509 certificate (ValidateX509Certificate).

Let’s try to look into this in GDB. First save the viewer.jnlp file used to launch the application and then run it under GDB:

gdb -ex "run" --args javaws viewer.jnlp

Let’s see what we get:

...
Reading symbols from javaws...(no debugging symbols found)...done.
Starting program: /usr/lib/jvm/java-8-jre/jre/bin/javaws viewer.jnlp
[Inferior 1 (process 18023) exited normally]
(gdb) #
...

Understandably, javaws forks a new process and then exits. Let’s try again, ordering GDB to retain control over forked processes while letting them all run asynchronously:

gdb -ex "set detach-on-fork off" -ex "set pagination off" -ex "set non-stop on" -ex "run" --args javaws viewer.jnlp

This time, we get:

...
[New Thread 0x86e97b40 (LWP 18163)]
[New Thread 0x86e16b40 (LWP 18164)]
[New Thread 0x86dc5b40 (LWP 18165)]
[Thread 0x86dc5b40 (LWP 18165) exited]
Reading symbols from /usr/lib/libgcc_s.so.1...done.
[New Thread 0x86dc5b40 (LWP 18168)]
[New Thread 0x867cbb40 (LWP 18169)]
[New Thread 0x85385b40 (LWP 18170)]
[New Thread 0x85334b40 (LWP 18171)]
[New Thread 0x852e3b40 (LWP 18172)]
[New Thread 0x85292b40 (LWP 18173)]
[New Thread 0x85241b40 (LWP 18174)]
[New Thread 0x851f0b40 (LWP 18175)]
[New Thread 0x8519fb40 (LWP 18176)]

Program received signal SIGSEGV, Segmentation fault.
0xa7ed7c3f in ?? ()
[New Thread 0x85083b40 (LWP 18177)]
[New Thread 0x85032b40 (LWP 18178)]
[Thread 0x85083b40 (LWP 18177) exited]
[Thread 0x85334b40 (LWP 18171) exited]
[Thread 0x85385b40 (LWP 18170) exited]
[Thread 0x85032b40 (LWP 18178) exited]
...

Okay, we got a SIGSEGV, but this is not the one we’re looking for as the Starting application… pop-up hasn’t even appeared. Let’s ignore the segfaults for now and break on the function call we saw in the error log before:

gdb -ex "set detach-on-fork off" -ex "set pagination off" -ex "set non-stop on" -ex "handle all nostop" -ex "set breakpoint pending on" -ex "break ValidateX509Certificate" -ex "run" --args javaws viewer.jnlp

This time, a whole lot of segfaults are raised, but it seems Java handles them just fine. After a while we get to the desired function call:

...
Program received signal SIGSEGV, Segmentation fault.
Reading symbols from /usr/lib/libstdc++.so.6...done.
[New Thread 0x8399db40 (LWP 18278)]

Breakpoint 1, 0x83b23884 in ValidateX509Certificate(char*, int, int*, char*)@plt () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
...

Remember GDB is running asynchronously, so let’s switch to the correct process and thread before we do anything:

(gdb) info inferiors
  Num  Description       Executable        
  4    process 18218     /usr/lib/jvm/java-8-jre/jre/bin/java 
* 1    <null>            /usr/lib/jvm/java-8-jre/jre/bin/javaws 
(gdb) inferior 4
[Switching to inferior 4 [process 18218] (/usr/lib/jvm/java-8-jre/jre/bin/java)]
[Switching to thread 73 (Thread 0x8399db40 (LWP 18278))] (running)
(gdb) info threads
  Id   Target Id         Frame 
* 73   Thread 0x8399db40 (LWP 18278) "java" (running)
  72   Thread 0x83faeb40 (LWP 18277) "java" (running)
  68   Thread 0x83fffb40 (LWP 18273) "java" (running)
  67   Thread 0x84f83b40 (LWP 18272) "java" (running)
  66   Thread 0x84e90b40 (LWP 18271) "java" (running)
  65   Thread 0x84f32b40 (LWP 18270) "java" (running)
  64   Thread 0x85292b40 (LWP 18269) "java" (running)
  63   Thread 0x84ee1b40 (LWP 18268) "java" 0x83b23884 in ValidateX509Certificate(char*, int, int*, char*)@plt () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
  60   Thread 0x84397b40 (LWP 18265) "java" (running)
  55   Thread 0x85385b40 (LWP 18260) "java" (running)
  54   Thread 0x84e3fb40 (LWP 18259) "java" (running)
...
(gdb) thread 63
[Switching to thread 63 (Thread 0x84ee1b40 (LWP 18268))]
#0  0x83b23884 in ValidateX509Certificate(char*, int, int*, char*)@plt () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so

Let’s clear the breakpoint at ValidateX509Certificate and instead order GDB to break when the next segfault happens:

(gdb) delete 1
(gdb) handle SIGSEGV stop
Signal        Stop  Print   Pass to program Description
SIGSEGV       Yes   Yes Yes     Segmentation fault
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x836cd4f0 in ?? ()

Let’s see where we landed:

(gdb) bt 5
#0  0x836cd4f0 in ?? ()
#1  0x83a701f6 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
#2  0x83a707e4 in ValidateX509Certificate(char*, int, int*, char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
#3  0x83a2c4ff in Java_com_avocent_app_security_X509CertificateJNI_ValidateX509Certificate () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
#4  0xa7dd8d26 in ?? ()
(More stack frames follow...)

Bingo, this looks like the segfault we were looking for. However, we got an extra hint: a call to InitSSLContext also happens before the segfault occurs. Let’s restart GDB, breaking on InitSSLContext this time:

gdb -ex "set detach-on-fork off" -ex "set pagination off" -ex "set non-stop on" -ex "handle all nostop" -ex "set breakpoint pending on" -ex "break InitSSLContext" -ex "run" --args javaws viewer.jnlp

Let’s see where this gets us:

...
[New Thread 0x83faeb40 (LWP 18577)]
Reading symbols from /usr/lib/libstdc++.so.6...done.
[New Thread 0x8399db40 (LWP 18579)]

Breakpoint 1, 0x83b70006 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so

As InitSSLContext seems to be the last function called before the segfault happens, it’s reasonable to assume that its code is faulty. Let’s disassemble it.

(gdb) set logging file InitSSLContext.txt
(gdb) set logging on
Copying output to InitSSLContext.txt.
(gdb) disas
Dump of assembler code for function _ZL14InitSSLContextPc:
   0x83b70002 <+0>:     push   %ebp
   0x83b70003 <+1>:     mov    %esp,%ebp
   0x83b70005 <+3>:     push   %ebx
=> 0x83b70006 <+4>:     sub    $0x244,%esp
   0x83b7000c <+10>:    call   0x83b28337 <__i686.get_pc_thunk.bx>
...
   0x83b70441 <+1087>:  pop    %ebx
   0x83b70442 <+1088>:  pop    %ebp
   0x83b70443 <+1089>:  ret    
End of assembler dump.
(gdb) set logging off
Done logging to InitSSLContext.txt.

Now, let’s step instruction by instruction until we hit the faulty one:

(gdb) nexti
0x83b7000c in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb)
0x83b70011 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb)
0x83b70017 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
...
(gdb)
0x83b701ec in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb)
0x83b701f2 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb)
0x83b701f4 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb)

Program received signal SIGSEGV, Segmentation fault.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x836804f0, pid=18518, tid=2230844224
#
...

Bingo! We have now learned which instruction causes the segfault (the one at 0x83b701f4). As we don’t have access to the source code of this function, let’s try to retrace the steps the application has taken through it. To do that, I saved all the program counters printed by GDB while stepping to a file and then extracted the disassembled code for each of them from InitSSLContext.txt using a simple one-liner:

awk '{print $1}' steps.txt | while read PC; do grep -P "^\s+${PC}" InitSSLContext.txt; done > flow.txt

Then, to get a rough idea of what this function does, let’s check which other functions it calls:

$ grep "call" flow.txt
   0x8ecb4030 <+46>:        call   0x8ec67454 <X509_STORE_new@plt>
   0x8ecb4080 <+126>:       call   0x8ec66944 <_ZN8VMCTrace4OutAEPKcz@plt>
   0x8ecb4096 <+148>:       call   0x8ec693d4 <dlopen@plt>
   0x8ecb410d <+267>:       call   0x8ec682d4 <dlsym@plt>
   0x8ecb4169 <+359>:       call   0x8ec682d4 <dlsym@plt>
   0x8ecb41bb <+441>:       call   0x8ec67314 <dlclose@plt>
   0x8ecb41d0 <+462>:       call   0x8ec66944 <_ZN8VMCTrace4OutAEPKcz@plt>
   0x8ecb41f4 <+498>:       call   *%eax

This explains a lot. We see dlsym being used and then a call to a variable address (stored in the EAX register) is performed, which causes a segfault. As dlsym is used for getting pointers to symbols in dynamically linked libraries, it’s reasonable to assume that the actual bug is caused by calling dlclose before calling the function pointed to by the pointer retrieved using dlsym. Such an action would have to result in a segfault as dlcose unmaps the given library from the process’ address space. But let’s make sure instead of speculating.

Let’s run GDB once again, this time breaking after the dlsym calls inside InitSSLContext to check their return values. After breaking inside InitSSLContext, let’s do the following:

(gdb) disas
Dump of assembler code for function _ZL14InitSSLContextPc:
   0x839b4002 <+0>:     push   %ebp
   0x839b4003 <+1>:     mov    %esp,%ebp
...
   0x839b410d <+267>:   call   0x839682d4 <dlsym@plt>
   0x839b4112 <+272>:   mov    %eax,%edx
...
   0x839b4169 <+359>:   call   0x839682d4 <dlsym@plt>
   0x839b416e <+364>:   mov    %eax,%edx
...
   0x839b41f4 <+498>:   call   *%eax
...
End of assembler dump.
(gdb) break *0x839b4112
Breakpoint 2 at 0x839b4112
(gdb) break *0x839b416e
Breakpoint 3 at 0x839b416e
(gdb) break *0x839b41f4
Breakpoint 4 at 0x839b41f4
(gdb) cont
Continuing.

Breakpoint 2, 0x839b4112 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb) info register eax
eax            0x836624f0   -2090457872
(gdb) cont
Continuing.

Breakpoint 3, 0x839b416e in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb) info register eax
eax            0x836624b0   -2090457936
(gdb) cont
Continuing.

Breakpoint 4, 0x839b41f4 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb) info register eax
eax            0x836624f0   -2090457872
(gdb) x $eax
0x836624f0: Cannot access memory at address 0x836624f0

Bingo! A call is attempted to the return value of the first dlsym call after dlclose is called. We found the bug!

We now know the application is broken, but that doesn’t solve the problem. Let’s see which library is loaded by the dlopen call inside InitSSLContext. Let’s run GDB again and do the following after breaking inside InitSSLContext:

(gdb) disas
Dump of assembler code for function _ZL14InitSSLContextPc:
...
   0x83bb7085 <+131>:   movl   $0x1,0x4(%esp)
   0x83bb708d <+139>:   lea    -0x4910a(%ebx),%eax
   0x83bb7093 <+145>:   mov    %eax,(%esp)
   0x83bb7096 <+148>:   call   0x83b6c3d4 <dlopen@plt>
...
End of assembler dump.
(gdb) break *0x83bb7096
Breakpoint 2 at 0x83bb7096
(gdb) cont
Continuing.

Breakpoint 2, 0x83bb7096 in InitSSLContext(char*) () from /home/kempniu/.java/deployment/cache/6.0/33/63d85021-5704f730-n/libVMAPI_DLL.so
(gdb) x/s $eax
0x83c9ef3e: "/usr/lib/libssl.so"

/usr/lib/libssl.so is a symlink to libssl.so.1.0.0 on my system, so I tested my luck and simply removed it before running the iDRAC Virtual Console again. To my amazement, this time it worked correctly! So it seems the application can at least handle failures while opening /usr/lib/libssl.so.

We have a workaround, but removing symlinks which other applications in your system may depend on can hardly be called a solution. Instead, let’s use some LD_PRELOAD magic to replace the glibc version of dlopen with our own, which returns NULL when trying to dynamically load /usr/lib/libssl.so (inspired by an entry from Peteris Krumins’ blog).

WARNING: The fix below is not Dell-approved. Use at your own risk.

#define _GNU_SOURCE

#include <stdio.h>
#include <string.h>
#include <dlfcn.h>

void *dlopen(const char *filename, int flags) {
    if (filename && !strcmp(filename, "/usr/lib/libssl.so"))
            return NULL;
    void *(*original_dlopen)(const char *, int);
    original_dlopen = dlsym(RTLD_NEXT, "dlopen");
    return (*original_dlopen)(filename, flags);
}

Compile it:

gcc -Wall -fPIC -shared -o idracfix.so idracfix.c -ldl

and launch the application once again while setting LD_PRELOAD:

LD_PRELOAD=./idracfix.so javaws viewer.jnlp

Voilà! We have a working fix! Pretty nice for a closed source issue. Time to report it to Dell!

Update: This problem has been fixed in firmware version 2.10.10.10, which supports both iDRAC 7 and iDRAC 8. Due to that, the fix presented above is now obsolete as you just need to upgrade your iDRAC’s firmware to get rid of the issue.

Advertisements

One thought on “iDRAC Virtual Console, Linux and segfaults

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s