-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPart2_LinuxKernelBootOptions.html
556 lines (469 loc) · 32 KB
/
Part2_LinuxKernelBootOptions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>External Interrupts in the x86 system. Part 2. Linux kernel boot options</title>
</head>
<body>
<h1>External Interrupts in the x86 system. Part 2. Linux kernel boot options</h1>
<p>In the <a href="https://habr.com/ru/post/446312/">last part</a> we discussed evolution of the interrupt delivery process from the devices in the x86 system (PIC → APIC → MSI), general theory, and all the necessary terminology.<p/>
<p>In this practical part we will look at how to roll back to the use of obsolete methods of interrupt delivery in Linux, and in particular we will look at Linux kernel boot options:<p/>
<ul>
<li>pci=nomsi</li>
<li>noapic</li>
<li>nolapic</li>
</ul>
<p>Also we will look at the order in which the OS looks for interrupt routing tables (ACPI/MPtable/$PIR) and what the impact is from the following boot options:<p/>
<ul>
<li>pci=noacpi</li>
<li>acpi=noirq</li>
<li>acpi=off</li>
</ul>
<p>You've probably used some combination of these options when one of the devices in your system hasn't worked correctly because of an interrupt problem. We'll go through these options and find out what they do and how they change the kernel '/proc/interrupts' interface output.</p>
<cut />
<h3>Boot without any extra options</h3>
<p>In this article for our interrupt investigation we will be using custom board with the Intel Haswell i7 CPU with the LynxPoint-LP chipset which runs <a href="https://www.coreboot.org/">coreboot</a>.<p/>
<p>We will be getting information about interrupts in the Linux system through the command:</p>
<source lang="bash"><pre>cat /proc/interrupts</pre></source>
<p>Here is the output when the kernel was booted without any external options:</p>
<source lang="bash">
<pre>
CPU0 CPU1 CPU2 CPU3
0: 15 0 0 0 IO-APIC-edge timer
1: 0 1 0 1 IO-APIC-edge i8042
8: 0 0 0 1 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 1 IO-APIC-edge
23: 16 247 7 10 IO-APIC-fasteoi ehci_hcd:usb1
56: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
57: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
58: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
59: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
60: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
61: 0 0 0 0 PCI-MSI-edge aerdrv,PCIe PME
62: 3118 1984 972 3454 PCI-MSI-edge ahci
63: 1 0 0 0 PCI-MSI-edge eth59
64: 2095 57 4 832 PCI-MSI-edge eth59-rx-0
65: 6 18 1 1309 PCI-MSI-edge eth59-rx-1
66: 13 512 2 1 PCI-MSI-edge eth59-rx-2
67: 10 61 232 2 PCI-MSI-edge eth59-rx-3
68: 169 0 0 0 PCI-MSI-edge eth59-tx-0
69: 14 14 4 205 PCI-MSI-edge eth59-tx-1
70: 11 491 3 0 PCI-MSI-edge eth59-tx-2
71: 20 19 134 50 PCI-MSI-edge eth59-tx-3
72: 0 0 0 0 PCI-MSI-edge eth58
73: 2 1 0 152 PCI-MSI-edge eth58-rx-0
74: 3 150 2 0 PCI-MSI-edge eth58-rx-1
75: 2 34 117 2 PCI-MSI-edge eth58-rx-2
76: 153 0 2 0 PCI-MSI-edge eth58-rx-3
77: 4 0 2 149 PCI-MSI-edge eth58-tx-0
78: 4 149 2 0 PCI-MSI-edge eth58-tx-1
79: 4 0 117 34 PCI-MSI-edge eth58-tx-2
80: 153 0 2 0 PCI-MSI-edge eth58-tx-3
81: 66 106 2 101 PCI-MSI-edge snd_hda_intel
82: 928 5657 262 224 PCI-MSI-edge i915
83: 545 56 32 15 PCI-MSI-edge snd_hda_intel
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 4193 3644 3326 3499 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 290 233 590 111 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 1339 2163 2404 1946 Rescheduling interrupts
CAL: 607 537 475 559 Function call interrupts
TLB: 163 202 164 251 TLB shootdowns
TRM: 48 48 48 48 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 3 3 3 3 Machine check polls
ERR: 0
MIS: 0
</pre>
</source>
<p>File '/proc/interrupts' is the procfs Linux interface to the interrupt subsystem, and it presents a table about the number of interrupts on every CPU core in the system in the following form:</p>
<ul>
<li>First column: interrupt number</li>
<li>CPUx columns: interrupt counters for every CPU core in the system</li>
<li>Next column: interrupt type:
<ul>
<li>IO-APIC-edge - edge-triggered interrupt for the I/O APIC controller</li>
<li>IO-APIC-fasteoi - level-triggered interrupt for the I/O APIC controller</li>
<li>PCI-MSI-edge - MSI interrupt</li>
<li>XT-PIC-XT-PIC - interrupt for the PIC controller (we will see it later)</li>
</ul></li>
<li>Last column: device (driver) associated with this interrupt</li>
</ul>
<p>Everything here is like it is supposed to be in the modern system. For the devices and drivers which support MSI/MSI-X, this is the type of interrupt that they use. The rest of the interrupt routing is done through the APIC controller.<p/>
<p>Simplistically, the interrupt routing schematics can be drawn like this: (red lines are active routing paths and black lines are unused routing paths)<p/>
<img src="https://habrastorage.org/webt/pb/ej/7c/pbej7cfkiyscux6fzyttfbi99my.png" />
<p>A device that supports MSI/MSI-X interrupts should have that particular capability listed in its <a href="https://en.wikipedia.org/wiki/PCI_configuration_space">PCI configuration space</a>.<p/>
<p>As an example of that let's look at a little fragment of the lspci output for the devices that declare they use MSI/MSI-X. In our case it is a SATA controller (interrupt 'ahci'), two ethernet controllers (interrupts 'eth58*' and 'eth59*'), graphical controller ('i915'), and two HD Audio controllers ('snd_hda_intel').
<source lang="bash"><pre>lspci -v</pre></source>
<source lang="bash"><pre>
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
...
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 09
...
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Kernel driver in use: snd_hda_intel
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
...
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Kernel driver in use: snd_hda_intel
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0])
...
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Kernel driver in use: ahci
05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
...
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Kernel driver in use: igb
05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
...
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Kernel driver in use: igb</pre></source>
<p>As we see, all of these devices either have a string "MSI: Enable+" or "MSI-X: Enable+".<p/>
<p>Let's downgrade our system! For a start let's boot with the kernel option 'pci=nomsi'.</p>
<h3>pci=nomsi</h3>
<p>Because of this option MSI interrupts become IO-APIC/XT-PIC depending on the interrupt controller in use.</p>
<p>In this case the priority choice is still modern APIC controller, so the interrupt picture will be:</p>
<img src="https://habrastorage.org/webt/er/ft/jy/erftjybrd7rm61bmmmhcs3qfzni.png" />
<p>Output of /proc/interrupts:<p/>
<source lang="bash">
<pre> CPU0 CPU1 CPU2 CPU3
0: 15 0 0 0 IO-APIC-edge timer
1: 0 1 0 1 IO-APIC-edge i8042
8: 0 0 1 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
12: 0 0 0 1 IO-APIC-edge
16: 1314 5625 342 555 IO-APIC-fasteoi i915, snd_hda_intel, eth59
17: 5 0 1 34 IO-APIC-fasteoi eth58
21: 2882 2558 963 2088 IO-APIC-fasteoi ahci
22: 26 81 2 170 IO-APIC-fasteoi snd_hda_intel
23: 23 369 8 8 IO-APIC-fasteoi ehci_hcd:usb1
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 3011 3331 2435 2617 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 197 228 544 85 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 1708 2349 1821 1569 Rescheduling interrupts
CAL: 520 554 509 555 Function call interrupts
TLB: 187 181 205 179 TLB shootdowns
TRM: 102 102 102 102 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 0
MIS: 0
</pre></source>
<p>As expected, all MSI/MSI-X interrupts have disappeared. Instead of them devices now use interrupts of 'IO-APIC-fasteoi' type.<p/>
<p>Let us draw our attention to the fact that earlier, before enabling this kernel boot option, each of the 'eth58' and 'eth59' had nine interrupts! But now each of them has only one interrupt. Recall that without the MSI, one function in the PCI device can have only one interrupt!<p/>
<p>Here is a little info from the 'dmesg' command about the ethernet controllers' initialization:</p>
<p>- boot without the 'pci=nomsi' option:</p>
<source lang="bash"><pre>
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
acpi:acpi_pci_irq_enable: igb 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
igb 0000:05:00.0: irq 63 for MSI/MSI-X
igb 0000:05:00.0: irq 64 for MSI/MSI-X
igb 0000:05:00.0: irq 65 for MSI/MSI-X
igb 0000:05:00.0: irq 66 for MSI/MSI-X
igb 0000:05:00.0: irq 67 for MSI/MSI-X
igb 0000:05:00.0: irq 68 for MSI/MSI-X
igb 0000:05:00.0: irq 69 for MSI/MSI-X
igb 0000:05:00.0: irq 70 for MSI/MSI-X
igb 0000:05:00.0: irq 71 for MSI/MSI-X
igb 0000:05:00.0: irq 63 for MSI/MSI-X
igb 0000:05:00.0: irq 64 for MSI/MSI-X
igb 0000:05:00.0: irq 65 for MSI/MSI-X
igb 0000:05:00.0: irq 66 for MSI/MSI-X
igb 0000:05:00.0: irq 67 for MSI/MSI-X
igb 0000:05:00.0: irq 68 for MSI/MSI-X
igb 0000:05:00.0: irq 69 for MSI/MSI-X
igb 0000:05:00.0: irq 70 for MSI/MSI-X
igb 0000:05:00.0: irq 71 for MSI/MSI-X
igb 0000:05:00.0: added PHC on eth0
igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:05:00.0: eth0: (PCIe:5.0Gb/s:Width x1) 00:15:d5:03:00:2a
igb 0000:05:00.0: eth0: PBA No: 106300-000
igb 0000:05:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
acpi:acpi_pci_irq_enable: igb 0000:05:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
igb 0000:05:00.1: irq 72 for MSI/MSI-X
igb 0000:05:00.1: irq 73 for MSI/MSI-X
igb 0000:05:00.1: irq 74 for MSI/MSI-X
igb 0000:05:00.1: irq 75 for MSI/MSI-X
igb 0000:05:00.1: irq 76 for MSI/MSI-X
igb 0000:05:00.1: irq 77 for MSI/MSI-X
igb 0000:05:00.1: irq 78 for MSI/MSI-X
igb 0000:05:00.1: irq 79 for MSI/MSI-X
igb 0000:05:00.1: irq 80 for MSI/MSI-X
igb 0000:05:00.1: irq 72 for MSI/MSI-X
igb 0000:05:00.1: irq 73 for MSI/MSI-X
igb 0000:05:00.1: irq 74 for MSI/MSI-X
igb 0000:05:00.1: irq 75 for MSI/MSI-X
igb 0000:05:00.1: irq 76 for MSI/MSI-X
igb 0000:05:00.1: irq 77 for MSI/MSI-X
igb 0000:05:00.1: irq 78 for MSI/MSI-X
igb 0000:05:00.1: irq 79 for MSI/MSI-X
igb 0000:05:00.1: irq 80 for MSI/MSI-X
igb 0000:05:00.1: added PHC on eth1
igb 0000:05:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:05:00.1: eth1: (PCIe:5.0Gb/s:Width x1) 00:15:d5:03:00:2b
igb 0000:05:00.1: eth1: PBA No: 106300-000
igb 0000:05:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
</pre></source>
<p>- boot with the 'pci=nomsi' option:</p>
<source lang="bash"><pre>
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
acpi:acpi_pci_irq_enable: igb 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
igb 0000:05:00.0: added PHC on eth0
igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:05:00.0: eth0: (PCIe:5.0Gb/s:Width x1) 00:15:d5:03:00:2a
igb 0000:05:00.0: eth0: PBA No: 106300-000
igb 0000:05:00.0: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s)
acpi:acpi_pci_irq_enable: igb 0000:05:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
igb 0000:05:00.1: added PHC on eth1
igb 0000:05:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:05:00.1: eth1: (PCIe:5.0Gb/s:Width x1) 00:15:d5:03:00:2b
igb 0000:05:00.1: eth1: PBA No: 106300-000
igb 0000:05:00.1: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s)</pre></source>
<p>Because of the decreased number of interrupts per device, enabling this option can lead to a significant performance limitation of the device driver, and that is not even counting that according to the Intel research <a href="https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/msg-signaled-interrupts-paper.pdf">'Reducing Interrupt Latency Through the Use of Message Signaled Interrupts'</a>, MSI interrupts 3 times faster than the IO-APIC interrupts and 5 times faster than the PIC interrupts.<p/>
<h3>noapic</h3>
<p>This option disables I/O APIC. MSI interrupts can still find their way to all of the CPUs, but the rest of interrupts from the devices can go only to CPU0, because PIC is only connected to CPU0. However, LAPIC is working and all other CPUs can still work and handle interrupts.<p/>
<img src="https://habrastorage.org/webt/uw/tf/rt/uwtfrt5lgbpfenyjwb3u_fc4yvq.png" />
<source lang="bash"><pre> CPU0 CPU1 CPU2 CPU3
0: 5 0 0 0 XT-PIC-XT-PIC timer
1: 2 0 0 0 XT-PIC-XT-PIC i8042
2: 0 0 0 0 XT-PIC-XT-PIC cascade
8: 1 0 0 0 XT-PIC-XT-PIC rtc0
9: 0 0 0 0 XT-PIC-XT-PIC acpi
12: 172 0 0 0 XT-PIC-XT-PIC ehci_hcd:usb1
56: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
57: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
58: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
59: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
60: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
61: 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
62: 2833 2989 1021 811 PCI-MSI-edge ahci
63: 0 1 0 0 PCI-MSI-edge eth59
64: 301 52 9 3 PCI-MSI-edge eth59-rx-0
65: 12 24 3 178 PCI-MSI-edge eth59-rx-1
66: 14 85 6 2 PCI-MSI-edge eth59-rx-2
67: 17 24 307 1 PCI-MSI-edge eth59-rx-3
68: 70 18 8 10 PCI-MSI-edge eth59-tx-0
69: 7 0 0 23 PCI-MSI-edge eth59-tx-1
70: 15 227 2 2 PCI-MSI-edge eth59-tx-2
71: 18 6 27 2 PCI-MSI-edge eth59-tx-3
72: 0 0 0 0 PCI-MSI-edge eth58
73: 1 0 0 27 PCI-MSI-edge eth58-rx-0
74: 1 22 0 5 PCI-MSI-edge eth58-rx-1
75: 1 0 22 5 PCI-MSI-edge eth58-rx-2
76: 23 0 0 5 PCI-MSI-edge eth58-rx-3
77: 1 0 0 27 PCI-MSI-edge eth58-tx-0
78: 1 22 0 5 PCI-MSI-edge eth58-tx-1
79: 1 0 22 5 PCI-MSI-edge eth58-tx-2
80: 23 0 0 5 PCI-MSI-edge eth58-tx-3
81: 187 17 70 7 PCI-MSI-edge snd_hda_intel
82: 698 1647 247 129 PCI-MSI-edge i915
83: 438 135 16 59 PCI-MSI-edge snd_hda_intel
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 1975 2499 2245 1474 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 132 67 429 91 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 1697 2178 1903 1541 Rescheduling interrupts
CAL: 561 496 534 567 Function call interrupts
TLB: 229 254 170 137 TLB shootdowns
TRM: 78 78 78 78 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 0
MIS: 0</pre></source>
<p>As we see, all IO-APIC-* interrupts have turned into XT-PIC-XT-PIC, and all of these interrupts have been routed to CPU0 only. MSI interrupts on the other hand have remained unchanged and go to all of the CPUs.<p/>
<h3>nolapic</h3>
<p>This kernel boot option disables LAPIC. MSI interrupts can't work without LAPIC, and I/O APIC can't work without LAPIC either. All of the device interrupts can only go to the PIC, and it works with the CPU0 only. And without LAPIC the rest of the CPUs besides CPU0 won't work.
<img src="https://habrastorage.org/webt/rm/jd/4t/rmjd4tr9wxvkghj5hzb5wu7fswi.png" />
<p>Output of /proc/interrupts:<p/>
<source lang="bash"><pre> CPU0
0: 6416 XT-PIC-XT-PIC timer
1: 2 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
3: 5067 XT-PIC-XT-PIC aerdrv, aerdrv, PCIe PME, PCIe PME, i915, snd_hda_intel, eth59
4: 32 XT-PIC-XT-PIC aerdrv, aerdrv, PCIe PME, PCIe PME, eth58
5: 0 XT-PIC-XT-PIC aerdrv, PCIe PME
6: 0 XT-PIC-XT-PIC aerdrv, PCIe PME
8: 1 XT-PIC-XT-PIC rtc0
9: 0 XT-PIC-XT-PIC acpi
11: 274 XT-PIC-XT-PIC snd_hda_intel
12: 202 XT-PIC-XT-PIC ehci_hcd:usb1
15: 7903 XT-PIC-XT-PIC ahci
NMI: 0 Non-maskable interrupts
LOC: 0 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 0 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
MCE: 0 Machine check exceptions
MCP: 1 Machine check polls
ERR: 0
MIS: 0
</pre></source>
<h3>Combinations of options:</h3>
<p>Actually there is only one combination for the new variant of routing: "noapic pci=nomsi". In this case all interrupts from the devices only go to the CPU0 through the PIC controller. But the LAPIC system is still working, so all the other CPUs can work and handle interrupts.<p/>
<p>You cannot combine any other options with "nolapic" since it makes I/O APIC and MSI unaccessible. Therefore, if you've ever added Linux kernel boot options like "noapic nolapic" (or the most common case "acpi=off noapic nolapic") it seems like you've written some extra letters.<p/>
<p>Finally, here is the result of the options "noapic pci=nomsi" to our interrupt routing picture:<p/>
<img src="https://habrastorage.org/webt/jx/bu/is/jxbuissu4c453b_5g-4pcs6_d-w.png" />
<p>And the output of /proc/interrupts is:<p/>
<source lang="bash"><pre> CPU0 CPU1 CPU2 CPU3
0: 5 0 0 0 XT-PIC-XT-PIC timer
1: 2 0 0 0 XT-PIC-XT-PIC i8042
2: 0 0 0 0 XT-PIC-XT-PIC cascade
3: 5072 0 0 0 XT-PIC-XT-PIC i915, snd_hda_intel, eth59
4: 32 0 0 0 XT-PIC-XT-PIC eth58
8: 1 0 0 0 XT-PIC-XT-PIC rtc0
9: 0 0 0 0 XT-PIC-XT-PIC acpi
11: 281 0 0 0 XT-PIC-XT-PIC snd_hda_intel
12: 200 0 0 0 XT-PIC-XT-PIC ehci_hcd:usb1
15: 7930 0 0 0 XT-PIC-XT-PIC ahci
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 2595 2387 2129 1697 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 159 90 482 135 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 1568 1666 1810 1833 Rescheduling interrupts
CAL: 431 556 549 558 Function call interrupts
TLB: 124 184 156 274 TLB shootdowns
TRM: 116 116 116 116 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 Machine check polls
ERR: 0
MIS: 0</pre></source>
<h3>Interrupt routing tables and the options "acpi=noirq", "pci=noacpi", "acpi=off"</h3>
<p>How does the operating system get information about the device interrupt routing? The BIOS prepares such info for the OS in the form of:</p>
<ul>
<li>ACPI tables (_PIC/_PRT functions)</li>
<li>_MP_ table (MPtable)</li>
<li>$PIR table</li>
<li>Registers 0x3C/0x3D of the device's PCI configuration space</li>
</ul>
<p>It is worth to note for the MSI interrupts declaration that the BIOS doesn't need to do anything extra (beside declaring the use of the LAPIC): all the aforementioned routing information is needed only for the APIC/PIC interrupt lines.</p>
<p>Tables in the list above are presented in the order of priority. Let's examine it in detail.<p/>
<p>Let's assume the BIOS has presented all this data and we boot our OS without any extra boot options:</p>
<ul>
<li>OS finds ACPI tables.</li>
<li>ОS executes ACPI function "_PIC", passing it the argument stating that the boot should happen in APIC mode. Here there is function code that usually saves the chosen mode in a variable (for example, PICM=1).</li>
<li>To access interrupt routing info the OS calls ACPI function "_PRT". This checks the PICM variable and returns routing for the APIC mode case.</li>
</ul>
<p>In the case when we boot with the option <b>noapic</b>:</p>
<ul>
<li>OS finds ACPI tables</li>
<li>ОS executes ACPI function "_PIC", passing it the argument stating that the boot should happen in PIC mode. Here there is function code that usually saves the chosen mode in a variable (for example, PICM=0)</li>
<li>To access interrupt routing info the OS calls ACPI function "_PRT". This checks the PICM variable and returns routing for the PIC mode case.</li>
</ul>
<p>If ACPI tables aren't present or interrupt routing with ACPI is disabled through the option <b>acpi=noirq</b> or <b>pci=noacpi</b> (or ACPI subsystem is completely disabled with the <b>acpi=off</b> option), then the OS looks for the MPtable (_MP_) to get all the interrupt routing information:<p/>
<ul>
<li>OS can't find/doesn't look at the ACPI tables</li>
<li>OS finds MPtable (_MP_)</li>
</ul>
<p>If ACPI tables aren't present or interrupt routing with ACPI is disabled through the option <b>acpi=noirq</b> or <b>pci=noacpi</b> (or ACPI subsystem is completely disabled with the <b>acpi=off</b> option), and if the MPtable (_MP_) is not present either (or there is a boot option <b>noapic</b> or <b>nolapic</b>):</p>
<ul>
<li>OS can't find/doesn't look at the ACPI tables</li>
<li>OS can't find/doesn't look at the MPtable (_MP_)</li>
<li>OS finds $PIR table</li>
</ul>
<p>If there is no $PIR table or it is not full, then the OS will look at the registers 0x3C/0x3D of the device's PCI configuration space to guess interrupt routing.</p>
<p>Here is a picture summarizing all of this:<p/>
<img src="https://habrastorage.org/webt/rj/9s/mk/rj9smkhk7c_ou_fzskxasgeriw8.png" />
<p>One should remember that not every BIOS provides all of these three tables (ACPI/MPtable/$PIR), so if you've passed an option to your bootloader (e.g. GRUB) that disables the use of ACPI or ACPI and MPtable for the interrupt routing, it is possible that your system won't boot.<p/>
<p><b>Note 1</b>: In the case when we try to boot in APIC mode with the option 'acpi=noirq' and without MPtable present, the picture of interrupts will be like in the case of normal booting with only the 'noapic' option. The operating system will go to PIC mode by itself. In the case when you try to boot without any ACPI tables at all ('acpi=off') and without MPtable present, then the picture will be like this:</p>
<source lang="bash"><pre> CPU0
0: 6 XT-PIC-XT-PIC timer
1: 2 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
8: 0 XT-PIC-XT-PIC rtc0
12: 373 XT-PIC-XT-PIC ehci_hcd:usb1
16: 0 PCI-MSI-edge PCIe PME
17: 0 PCI-MSI-edge PCIe PME
18: 0 PCI-MSI-edge PCIe PME
19: 0 PCI-MSI-edge PCIe PME
20: 0 PCI-MSI-edge PCIe PME
21: 0 PCI-MSI-edge PCIe PME
22: 8728 PCI-MSI-edge ahci
23: 1 PCI-MSI-edge eth59
24: 1301 PCI-MSI-edge eth59-rx-0
25: 113 PCI-MSI-edge eth59-tx-0
26: 0 PCI-MSI-edge eth58
27: 45 PCI-MSI-edge eth58-rx-0
28: 45 PCI-MSI-edge eth58-tx-0
29: 1280 PCI-MSI-edge snd_hda_intel
NMI: 2 Non-maskable interrupts
LOC: 24076 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 2 Performance monitoring interrupts
IWI: 2856 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 34 Thermal event interrupts
THR: 0 Threshold APIC interrupts
MCE: 0 Machine check exceptions
MCP: 2 Machine check polls
ERR: 0
MIS: 0
</pre></source>
<p>This happens because without the ACPI MADT table (<a href="https://wiki.osdev.org/MADT">Multiple APIC Description Table</a>) and the necessary info from the MPtable, the operating system doesn't know APIC identifiers (APIC IDs) for the other CPUs and can't work with them. But the LAPIC of the main CPU0 works because we haven't disabled it, and MSI interrupts can still go to it. So the interrupt picture would be:<p/>
<img src="https://habrastorage.org/webt/b-/pz/hx/b-pzhxxyugr5iv203wysxfs7-mo.png" />
<p><b>Note 2</b>: In general, interrupt routing with the use of ACPI in an APIC case should match the interrupt routing with the MPtable. Also, the interrupt routing with the use of ACPI in a PIC case should match the interrupt routing with the $PIR table. Therefore the '/proc/interrupts' output should not differ. But in my investigation I've noticed one strange fact. For some reason in the case of interrupt routing through the MPtable there is a cascade interrupt "XT-PIC-XT-PIC cascade" in the output:<p/>
<source lang="bash"><pre> CPU0 CPU1 CPU2 CPU3
0: 15 0 0 0 IO-APIC-edge timer
1: 2 0 0 0 IO-APIC-edge i8042
2: 0 0 0 0 XT-PIC-XT-PIC cascade
8: 0 1 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-edge acpi
...
</pre></source>
<p>It is a little bit strange that it happens like that, but it seems like <a href="https://elixir.bootlin.com/linux/latest/source/Documentation/x86/i386/IO-APIC.txt"> the kernel source documentation</a> says that it is OK.<p/>
<h3>Сonclusion</h3>
<p>In conclusion we review for one more time the discussed options.<p/>
<p>Interrupt controller choice options:<p>
<ul>
<li><b>pci=nomsi</b> - MSI interrupts become IO-APIC/XT-PIC depending on the interrupt controller in use.</li>
<li><b>noapic</b> - Disables I/O APIC. MSI interrupts can still go to all the other CPUs, the rest of the device interrupts can only go to the PIC, and it works with the CPU0 only. But LAPIC still works and other CPUs can work and handle interrupts.</li>
<li><b>noapic pci=nomsi</b> - All of the device interrupts can only go to the PIC, and it works with the CPU0 only. But LAPIC works and other CPUs can work and handle interrupts.</li>
<li><b>nolapic</b> - Disables LAPIC. MSI interrupts can't work without LAPIC, and I/O APIC can't work without LAPIC. All of the device interrupts can only go to the PIC, and it works with the CPU0 only. And without LAPIC the rest of the CPUs besides CPU0 won't work.</li>
</ul>
<p>Interrupt tables priority options:<p>
<ul>
<li><b>no options</b> - routing through the APIC with the help of ACPI tables</li>
<li><b>noapic</b> - routing through the PIC with the help of ACPI tables</li>
<li><b>acpi=noirq</b> (<b>pci=noacpi</b>/<b>acpi=off</b>) - routing through the APIC with the help of MPtable</li>
<li><b>acpi=noirq</b> (<b>pci=noacpi</b>/<b>acpi=off</b>) <b>noapic</b> (<b>nolapic</b>) - routing through the PIC with the help of $PIR</li>
</ul>
In the next part we will look at how coreboot configures the chipset for the interrupt routing.
</body>
</html>