Segmentation Fault in Ruby
For developers, segmentation faults can feel like a sudden nightmare—cryptic errors that crash your application out of nowhere. This frustration is amplified when they show up in high-level languages like Ruby, where memory management is typically handled behind the scenes. Recently, while running my Ruby application, I experienced a segmentation fault caused by a gem. The crash not only halted my program but also left me facing a daunting debugging challenge. In this post, I’ll talk about how I identified the issue, debugged it, and eventually found a solution.
What is a Segmentation Fault?
Ruby gems are great for adding functionality and speeding up development, but their native extensions can sometimes lead to issues like segmentation faults.
A segmentation fault is an error that occurs when a program attempts to access memory that it is not authorized to use. This mechanism acts as a safeguard, preventing the program from inadvertently corrupting memory and causing complex, hard-to-trace bugs.
It’s like trying to grab food from a table you weren’t invited to — an action that gets you abruptly ejected from the party.
This type of error behaves similarly across most programming languages that allow direct memory management. Although Ruby is a high-level language that automatically manages memory, Ruby gems that handle native extensions often interface directly with C code for performance, which can sometimes lead to low-level issues like segmentation faults.
Unexpected Crash
Seeing [BUG] Segmentation fault
flash on the screen felt like hitting a wall at full speed. The stack trace looked more like a maze than a roadmap.
When a segmentation fault occurs, the application suddenly crashes during execution and the crash report is usually very long and difficult to read. It typically shows a cryptic and intimidating error message. To make sense of the crash report, I saved it to a .txt
file for detailed analysis. This allowed me to break it down block by block to understand what was happening during the crash.
The first block of this report points to a segmentation fault during code execution, suggesting that an attempt was made to access a restricted memory area.
[BUG] Segmentation fault at 0x0000000100000010
ruby 2.5.9p229 (2021-04-05 revision 67939) [x86_64-linux]
The next block in the output provides information about the control frame at the point of failure. Control frames play a vital role in debugging by showing the program’s execution flow and the active call stack. However, in this case, the control frame details provided minimal information about the root cause of the segmentation fault.
-- Control frame information -----------------------------------------------
c:0001 p:---- s:0003 e:000002 (none) [FINISH]
The third block contains the machine register context, which provides a snapshot of the CPU’s state at the time of the crash. It reveals the values stored in each register, offering critical clues about the operations or instructions that may have caused the fault. This data can sometimes pinpoint specific operations or instructions that led up to the fault.
-- Machine register context ------------------------------------------------
RIP: 0x00007a6ba3002bcb RBP: 0x00007a6ba0e50e98 RSP: 0x00007a6ba0e50c28
RAX: 0x0000000100000010 RBX: 0x00005e04500b0748 RCX: 0x0000000000000000
RDX: 0x0000000100000010 RDI: 0x00005e044a1fde10 RSI: 0x00007a6ba0e50e10
R8: 0x0000000000000010 R9: 0x00005e044486f780 R10: 0x0000000000000010
R11: 0x0000000000000246 R12: 0x0000000000000000 R13: 0x00007ffc8d99d3af
R14: 0x00007a6ba0e51700 R15: 0x00005e044e12e100 EFL: 0x0000000000010202
The fourth block delivers more detailed information about the origin of the fault, including memory mapping and loaded libraries. This section was crucial in pinpointing the gem responsible for the crash.
-- Other runtime information -----------------------------------------------
* Loaded script: /usr/local/bundle/bin/rspec
* Loaded features:
0 enumerator.so
1 thread.rb
2 rational.so
3 complex.so
4 /usr/local/lib/ruby/2.5.0/x86_64-linux/enc/encdb.so
...
3938 /usr/local/bundle/gems/aws-sdk-core-2.6.36/lib/aws-sdk-core/query/param.rb
* Process memory map:
5e0444057000-5e0444058000 r--p 00000000 00:2d 397917 /usr/local/bin/ruby
5e0444058000-5e0444059000 r-xp 00001000 00:2d 397917 /usr/local/bin/ruby
5e0444059000-5e044405a000 r--p 00002000 00:2d 397917 /usr/local/bin/ruby
5e044405a000-5e044405b000 r--p 00002000 00:2d 397917 /usr/local/bin/ruby
5e044405b000-5e044405c000 rw-p 00003000 00:2d 397917 /usr/local/bin/ruby
5e044486e000-5e04519fa000 rw-p 00000000 00:00 0 [heap]
7a6ba287c000-7a6ba287d000 r--p 00003000 00:2d 530789 /usr/local/bundle/gems/concurrent-ruby-ext-1.0.5/lib/concurrent/extension.so
7a6ba2873000-7a6ba2874000 rw-p 00007000 00:2d 398875 /usr/local/lib/ruby/2.5.0/x86_64-linux/psych.so
7a6ba281d000-7a6ba281e000 rw-p 00013000 00:2d 526854 /usr/local/bundle/gems/bson-4.12.0/lib/bson_native.so
7a6ba2541000-7a6ba2543000 rw-p 0005b000 00:2d 537805 /usr/local/bundle/gems/openssl-2.2.3/lib/openssl.so
7a6ba2103000-7a6ba2104000 rw-p 00008000 00:2d 529796 /usr/local/bundle/gems/json-1.8.6/lib/json/ext/parser.so
7a6ba1e85000-7a6ba1e86000 rw-p 0000a000 00:2d 527825 /usr/local/bundle/gems/byebug-11.1.3/lib/byebug/byebug.so
7a6ba1a5a000-7a6ba1a5c000 rw-p 0006d000 00:2d 526967 /usr/local/bundle/gems/google-protobuf-3.9.2-x86_64-linux/lib/google/2.5/protobuf_c.so
7a6ba17e5000-7a6ba17e8000 rw-p 00289000 00:2d 532071 /usr/local/bundle/gems/grpc-1.24.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so
7a6ba11d2000-7a6ba11d6000 rw-p 00228000 00:2d 529081 /usr/local/bundle/gems/nokogiri-1.10.4/lib/nokogiri/nokogiri.so
...
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Identifying the origin of the failure
The blocks from the crash report above appeared while I was upgrading a Rails version from 7.0 to 7.1. Using bundle update --conservative
command to avoid unnecessary gem upgrades when bundling the application with the new Rails version, the bundler also upgraded a few other gems automatically to match compatibility versions.
So my first step was checking if any of the gems that were automatically upgraded by bundler didn’t necessarily need to be upgraded. However, when I ran the application again, the segmentation fault was there.
After researching further, and reading about this kind of failure, I tried to better understand the crash report. The fourth block as shown above was key for me to find the upgraded gem that was causing the issue.
In the * Process memory map:
section of this fourth block of the report there were 400 lines of libs and code that were loaded before getting the segmentation fault. Within the 400 lines of loaded libraries, I identified eight gems that were relevant to the issue.
/usr/local/bundle/gems/nokogiri-1.10.4/lib/nokogiri/nokogiri.so
/usr/local/bundle/gems/grpc-1.24.0-x86_64-linux/src/ruby/lib/grpc/2.5/grpc_c.so
/usr/local/bundle/gems/google-protobuf-3.9.2-x86_64-linux/lib/google/2.5/protobuf_c.so
/usr/local/bundle/gems/byebug-11.1.3/lib/byebug/byebug.so
/usr/local/bundle/gems/json-1.8.6/lib/json/ext/generator.so
/usr/local/bundle/gems/json-1.8.6/lib/json/ext/parser.so
/usr/local/bundle/gems/openssl-2.2.3/lib/openssl.so
/usr/local/bundle/gems/bson-4.12.0/lib/bson_native.so
/usr/local/bundle/gems/concurrent-ruby-ext-1.0.5/lib/concurrent/extension.so
After isolating the gems listed in the crash report, I tested the application by rolling back each gem individually. This process led me to suspect the concurrent-ruby
gem.
I reviewed the segmentation fault issues reported in the source code of the gems listed in the crash report. Eventually, I discovered a related issue in the concurrent-ruby
gem. Although the reported scenario involved a different Ruby version, the crash report closely resembled mine. This led me to a recent pull request that resolved the issue by introducing safer handling of thread-local variable finalizers.
I checked which version of the gem had merged this PR and what I needed to do in this case was upgrade concurrent-ruby
from 1.0.5
to any version higher than v1.1.6.pre1
to get these changes and overcome the segmentation fault.
Diving into Debugging Ruby C Extensions
If you’re familiar with C or want to dig deeper into debugging Ruby gems with native extensions, the A Rubyist’s Walk Along the C-side series by Peter Zhu is an excellent starting point.
In this series, Peter dives into the nuances of debugging native extensions in Ruby and offers practical techniques for identifying and resolving issues in C code. For instance, Peter explains how to use gdb
to step through C code and identify memory access issues in Ruby extensions.
Conclusion
Segmentation faults can be tricky, but they’re also an opportunity to learn more about Ruby’s underlying architecture. By using crash reports effectively, exploring community resources, and applying a structured debugging process, you can tackle even the most challenging bugs. Embrace the challenge of diving into the code—you may uncover the problem and find a solution on your own. With the right mindset, these challenges can become valuable learning experiences.
Need help upgrading your Rails application? Send us a message!