How to Debug Non-Deterministic Test Failures with RSpec
I recently wrote a spec for metric_fu
which
accidentally introduced a non-deterministic spec
(a flaky spec!). I had no idea why it was randomly failing. This is an
article to explain the process I followed to debug this issue.
The test that I wrote was testing the integration between metric_fu
and
reek
. When I started I wasn’t familiar with
neither of those projects’ codebases, so bear with me.
This is the code that randomly worked/failed:
context "with real output, not mocked nor doubled" do
let(:result) do
{
file_path: "spec/support/samples/alfa.rb",
code_smells: [
{
lines: [1],
message: "has unused parameter 'echo'",
method: "bravo",
type: "UnusedParameters"
}
]
}
end
before do
@generator = MetricFu::ReekGenerator.new(dirs_to_reek: ["spec/support/samples"])
@generator.emit
end
it "returns real data" do
@matches = @generator.analyze
expect(@matches.first).to eq(result)
end
end
Why did I write that spec? I’m glad you asked.
After bumping the dependency for
reek
I noticed that generator_spec.rb
was working just fine, which was a
little suspicious. This lead me to believe that there was no real “integration
spec” for a very simple scenario.
If you don’t have a CI failure to refer to, a quick and basic way to test a flaky spec is by using this bash line:
while bundle exec rspec spec; do :; done
That line will run bundle exec rspec spec
until RSpec fails and returns a non-zero
exit code.
Failures:
355
356 1) MetricFu::ReekGenerator analyze method with real output, not mocked nor doubled returns real data
357 Failure/Error: expect(@matches.first).to eq(result)
358
359 expected: {:code_smells=>[{:lines=>[2, 3, 4], :message=>"takes parameters ['echo', 'foxtrot', 'golf'] to 3 meth...f'", :method=>"Alfa#delta", :type=>"UnusedParameters"}], :file_path=>"spec/support/samples/alfa.rb"}
360 got: nil
361
362 (compared using ==)
363 # ./spec/metric_fu/metrics/reek/generator_spec.rb:174:in `block (4 levels) in <top (required)>'
364 # ./spec/support/timeout.rb:6:in `block in <top (required)>'
Sometimes the result returned by reek
is an empty result set!
Non-deterministic results have always been hard to debug for me. Fortunately
metric_fu
is using RSpec and it had the right setup for me to start digging
into it.
This is what its RSpec’s configuration looks like:
RSpec.configure do |config|
config.filter_run focus: true
config.run_all_when_everything_filtered = true
# Skip specs tagged `:slow` unless SLOW_SPECS is set
config.filter_run_excluding :slow unless ENV["SLOW_SPECS"]
# End specs on first failure if FAIL_FAST is set
config.fail_fast = ENV.include?("FAIL_FAST")
config.order = :rand
# ...
The important part there is the line where it randomizes the execution of test scenarios. If you don’t do this in your specs, you are at risk. The risk is that your scenarios will only run in a very specifically defined order.
The problem with this is that some scenarios might be leaking state into another scenario without you knowing it. This is a ticking time bomb. In the near future you might write a scenario which alters the execution order and starts causing random looking problems.
I strongly recommend you always run your RSpec tests in random order.
Moving on, when you specify random ordering in your RSpec configuration, its output will show the seed number at the end. For example:
...
Failed examples:
rspec ./spec/metric_fu/metrics/reek/generator_spec.rb:181 # MetricFu::ReekGenerator analyze method with real output, not mocked nor doubled returns real data
Randomized with seed 39100
Then you can use that seed number to replicate the execution order that RSpec used to run each test scenario:
rspec spec --seed 39100
If you combine that with byebug
(or pry
!), you can start debugging the
flakiness with the right state.
[180, 189] in /fastruby/metric_fu/spec/metric_fu/metrics/reek/generator_spec.rb
180:
181: it "returns real data" do
182: @matches = @generator.analyze
183:
184: byebug
=> 185: expect(@matches.first).to eq(result)
186: end
187: end
188:
189: context "without reek warnings" do
(byebug) dirs_to_reek = @generator.options[:dirs_to_reek]
["spec/support/samples"]
(byebug) Dir[File.join(dir, "**", "*.rb")]
[]
(byebug) dir
"spec/support/samples"
(byebug) Dir.pwd
"/Users/etagwerker/Projects/fastruby/metric_fu/spec/dummy"
At this point, RSpec executed all my test scenarios in a particular order that will cause that flaky spec to fail. That is good news! (Reproducing a flaky spec is always great!)
As you can see, there is something fishy going on. I added my support file to
spec/support
but RSpec’s working directory is path/to/metric_fu/spec/dummy
.
The weird part is that sometimes Dir.pwd
is path/to/metric_fu
and other
times it’s path/to/metric_fu/spec/dummy
.
Within the configuration file (spec/spec_helper.rb
) I find these lines:
def run_dir
File.expand_path("dummy", File.dirname(__FILE__))
end
config.before(:suite) do
MetricFu.run_dir = run_dir
end
My working hypothesis now is that run_dir
sometimes is one thing and other
times it is another, depending on the load order of RSpec’s test scenarios. I
find that MetricFu
already defines a namespaced method called run_dir
:
module MetricFu
# ...
def run_dir
@run_dir ||= Dir.pwd
end
So now my solution hypothesis is that changing the method name might fix the
flakiness. I go ahead and change the method name in spec_helper.rb
:
def dummy_dir
File.expand_path("dummy", File.dirname(__FILE__))
end
config.before(:suite) do
MetricFu.run_dir = dummy_dir
end
I run my test suite:
rspec spec --seed 39100
Unfortunately that did not solve the problem. The flakiness remained:
Failures:
355
356 1) MetricFu::ReekGenerator analyze method with real output, not mocked nor doubled returns real data
357 Failure/Error: expect(@matches.first).to eq(result)
358
359 expected: {:code_smells=>[{:lines=>[2, 3, 4], :message=>"takes parameters ['echo', 'foxtrot', 'golf'] to 3 meth...f'", :method=>"Alfa#delta", :type=>"UnusedParameters"}], :file_path=>"spec/support/samples/alfa.rb"}
360 got: nil
361
362 (compared using ==)
363 # ./spec/metric_fu/metrics/reek/generator_spec.rb:174:in `block (4 levels) in <top (required)>'
364 # ./spec/support/timeout.rb:6:in `block in <top (required)>'
I check once again and Dir.pwd
is sometimes one thing and other times another.
So, as usual, flakiness seems to be related to “leaky state”. After some digging
I’m 80% sure that leaky state is coming from MetricFu.run_dir
.
Any sort of modification to that “global” variable in one scenario will leak into other scenarios if you are not careful.
So I decide to address this problem in two different ways:
- Provide a native way to run
MetricFu
within a directory - Stop switching directories between scenarios
MetricFu.with_run_dir
I add this utility method:
module MetricFu
def with_run_dir(dir, &block)
old_dir = run_dir
self.run_dir = dir
block.call
self.run_dir = old_dir
end
This allows me to run MetricFu
report generation code within a specific
directory and not leak state.
Stop Switching Directories
I get rid of the problematic before(:suite)
block.
This generates a couple of failures in the test suite, which I decide to solve
by calling the MetricFu.with_run_dir
method when needed and moving files from
spec/dummy
to spec/support
.
Final Thoughts
These are some tips you can use next time you run into a non-deterministic failure. I know that “leaky state” is not the only root cause of flaky specs, but it’s probably the most common root cause.
I can now confidently say that I know a little more than nothing about reek
and
metric_fu
’s source code. Here is the pull request with all the changes:
https://github.com/metricfu/metric_fu/pull/306
Check out the next section if you want to learn more!
Resources
Others have written about this topic extensively, so if you are interested in knowing more about flaky tests, I recommend you read further:
- https://building.buildkite.com/5-ways-weve-improved-flakey-test-debugging-4b3cfb9f27c8
- https://samsaffron.com/archive/2019/05/15/tests-that-sometimes-fail
- https://www.codewithjason.com/common-causes-flickering-flapping-flaky-tests/