Skip to content

Commit f49f06a

Browse files
authored
Update ruby docs for backtrace debugging when CI node hangs (#185)
* update(ruby troubleshooting): add a legacy CI hangs faq for knapsack_pro < 7.7.0
1 parent 0c60524 commit f49f06a

File tree

1 file changed

+120
-22
lines changed

1 file changed

+120
-22
lines changed

docusaurus/docs/ruby/troubleshooting.mdx

Lines changed: 120 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -90,43 +90,141 @@ Some users reported frozen CI nodes with:
9090

9191
### Why does the CI node hang? (backtrace debugging)
9292

93-
Tests freeze and CI node timeouts (the Ruby process is killed by a CI provider). Add the following script to `spec_helper.rb`. Thanks to that, you will see the backtrace showing what code line causes the hanging problem. Backtrace is visible only if the Ruby process got kill with `USR1` signal. Learn more from this [GitHub issue](https://github.com/rspec/rspec-rails/issues/1353#issuecomment-93173691).
94-
95-
```ruby
96-
# source https://github.com/rspec/rspec-rails/issues/1353#issuecomment-93173691
97-
puts "Process pid: #{Process.pid}"
93+
Knapsack Pro would print Ruby threads to the output when tests freeze and CI node timeouts (the Ruby process is killed by a CI provider).
94+
See the backtrace(s) showing what code line causes the hanging problem. The backtrace is visible only if the Ruby process got killed with one of the OS signals like `HUP`, `INT`, `TERM`, `ABRT`, `QUIT`, `USR1`, or `USR2`.
95+
96+
```text title="Tests output"
97+
================================================================================
98+
Start logging 2 detected threads.
99+
Use the following backtrace(s) to find the line of code that got stuck
100+
if the CI node hung and terminated your tests.
101+
102+
Main thread backtrace:
103+
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:83:in `backtrace'
104+
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:83:in `block in log_threads'
105+
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:75:in `each'
106+
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:75:in `log_threads'
107+
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:62:in `block (2 levels) in trap_signals'
108+
# highlight-start
109+
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `kill'
110+
# highlight-end
111+
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `block (3 levels) in <top (required)>'
112+
.rvm/gems/ruby-3.3.4/gems/rspec-core-3.13.0/lib/rspec/core/example.rb:263:in `instance_exec'
113+
...
114+
115+
Non-main thread inspect: #<Thread:0x000000011e18f6e0 spec_integration/a_spec.rb:5 sleep>
116+
Non-main thread backtrace:
117+
# highlight-start
118+
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `sleep'
119+
# highlight-end
120+
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `block (3 levels) in <top (required)>'
121+
122+
123+
End logging threads.
124+
================================================================================
125+
```
98126

99-
trap 'USR1' do
100-
threads = Thread.list
127+
The above-highlighted lines of code got stuck. See them in the following spec files.
101128

102-
puts
103-
puts "=" * 80
104-
puts "Received USR1 signal; printing all #{threads.count} thread backtraces."
129+
The main thread was killed due to the following `b_spec.rb:7` line:
105130

106-
threads.each do |thr|
107-
description = thr == Thread.main ? "Main thread" : thr.inspect
108-
puts
109-
puts "#{description} backtrace: "
110-
puts thr.backtrace.join("\n")
131+
```ruby title="b_spec.rb" showLineNumbers
132+
describe 'B1_describe' do
133+
describe 'B1.1_describe' do
134+
xit 'B1.1.1 test example' do
135+
expect(1).to eq 1
136+
end
137+
it 'B1.1.2 test example' do
138+
# highlight-start
139+
Process.kill("INT", Process.pid)
140+
# highlight-end
141+
end
111142
end
112-
113-
puts "=" * 80
114143
end
115144
```
116145

117-
If the CI provider does not kill the frozen process, you can do it by running the below command in the terminal (when you are ssh-ed into the CI node):
146+
The non-main thread was stuck at the time due to the following `sleep` in the `a_spec.rb:6` line:
118147

119-
```bash
120-
kill -USR1 <process pid>
148+
```ruby title="a_spec.rb" showLineNumbers
149+
describe 'A_describe' do
150+
it 'A1 test example' do
151+
expect(1).to eq 1
152+
153+
Thread.new do
154+
# highlight-start
155+
sleep 10
156+
# highlight-end
157+
end
158+
end
159+
end
121160
```
122161

123-
Alternatively, you can use the `timeout` program to send the USR1 signal after Knapsack runs too long.
162+
Use the above approach to discover the root issue of hanging tests in your test suite.
163+
164+
:::tip
165+
166+
You can use [the `timeout` program](https://man7.org/linux/man-pages/man1/timeout.1.html) to send the `USR1` signal after Knapsack runs too long if the CI provider does not kill the frozen process.
124167

125168
```bash
126169
timeout --signal=USR1 600 bundle exec rake "knapsack_pro:queue:rspec[--format d]"
127170
```
128171

129-
Notice that `600` is in seconds. It sends the signal after 10 minutes of running the command. You may want to adjust that number to ensure the USR1 signal is sent after the process is stuck and not before.
172+
Notice that `600` is in seconds. It sends the signal after 10 minutes of running the command. You may want to adjust that number to ensure the `USR1` signal is sent after the process is stuck and not before.
173+
For example, you could use 600 seconds (10 minutes) if you know that your tests take no more than 7 minutes to run.
174+
175+
:::
176+
177+
:::tip
178+
179+
Alternatively, you can check your CI provider documentation to configure a timeout. For example, Github Actions uses:
180+
181+
* [`jobs.<job_id>.timeout-minutes`](https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes) - The maximum number of minutes to let a job run before GitHub automatically cancels it.
182+
* [`jobs.<job_id>.steps[*].timeout-minutes`](https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes) - The maximum number of minutes to run the step before killing the process.
183+
184+
:::
185+
186+
<details>
187+
<summary>For legacy versions of `knapsack_pro` older than 7.7.0, please click here.</summary>
188+
189+
Tests freeze and CI node timeouts (the Ruby process is killed by a CI provider). Add the following script to `spec_helper.rb`. Thanks to that, you will see the backtrace showing what code line causes the hanging problem. Backtrace is visible only if the Ruby process got kill with `USR1` signal. Learn more from this [GitHub issue](https://github.com/rspec/rspec-rails/issues/1353#issuecomment-93173691).
190+
191+
```ruby
192+
# source https://github.com/rspec/rspec-rails/issues/1353#issuecomment-93173691
193+
puts "Process pid: #{Process.pid}"
194+
195+
trap 'USR1' do
196+
threads = Thread.list
197+
198+
puts
199+
puts "=" * 80
200+
puts "Received USR1 signal; printing all #{threads.count} thread backtraces."
201+
202+
threads.each do |thr|
203+
description = thr == Thread.main ? "Main thread" : thr.inspect
204+
puts
205+
puts "#{description} backtrace: "
206+
puts thr.backtrace.join("\n")
207+
end
208+
209+
puts "=" * 80
210+
end
211+
```
212+
213+
If the CI provider does not kill the frozen process, you can do it by running the below command in the terminal (when you are ssh-ed into the CI node):
214+
215+
```bash
216+
kill -USR1 <process pid>
217+
```
218+
219+
Alternatively, you can use the `timeout` program to send the USR1 signal after Knapsack runs too long.
220+
221+
```bash
222+
timeout --signal=USR1 600 bundle exec rake "knapsack_pro:queue:rspec[--format d]"
223+
```
224+
225+
Notice that `600` is in seconds. It sends the signal after 10 minutes of running the command. You may want to adjust that number to ensure the USR1 signal is sent after the process is stuck and not before.
226+
227+
</details>
130228

131229
## Knapsack Pro does not work on a forked repository
132230

0 commit comments

Comments
 (0)