Skip to content
This repository has been archived by the owner on Nov 4, 2022. It is now read-only.

Watchdog failure stenotype abort after 2 minutes (Debian 10) #228

Open
dspruell-s01 opened this issue Sep 1, 2020 · 15 comments
Open

Watchdog failure stenotype abort after 2 minutes (Debian 10) #228

dspruell-s01 opened this issue Sep 1, 2020 · 15 comments

Comments

@dspruell-s01
Copy link

  • Debian GNU/Linux 10 (buster)
  • Linux 4.19.0-10-amd64
  • go version go1.11.6 linux/amd64

Using latest stenographer, can get through install and setup and systemd starts correctly, but seems to die (or be killed).

$ sudo systemctl status stenographer
 stenographer.service - packet capture to disk
   Loaded: loaded (/lib/systemd/system/stenographer.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-09-01 14:44:46 UTC; 2min 59s ago
 Main PID: 10793 (stenographer)
    Tasks: 13 (limit: 4915)
   Memory: 21.1M
   CGroup: /system.slice/stenographer.service
           ├─10793 /usr/bin/stenographer
           └─10829 [stenotype]

Sep 01 14:44:46 astraea systemd[1]: Started packet capture to disk.
Sep 01 14:46:47 astraea stenographer[10793]: 2020-09-01T14:46:47.287721Z T:fffff7 [util.cc:117] WATCHDOG FAILURE: Thread 0 ABORTABORTABORT
Sep 01 14:46:47 astraea stenographer[10793]: /usr/bin/stenotype(+0xb0d6) [0x56553817f0d6]
                                             /usr/bin/stenotype(+0x96e2) [0x56553817d6e2]
                                             /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7f8f8876bb2f]
                                             /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f8f88a8efa3]
Sep 01 14:46:47 astraea stenographer[10793]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8f8844b4cf]
Sep 01 14:46:47 astraea stenographer[10793]: 2020/09/01 14:46:47 Stenotype stopped after 2m1.002707097s: stenotype wait failed: signal: aborted
...
Sep 01 15:26:58 astraea stenographer[10793]: 2020-09-01T15:26:58.396742Z T:e0a8f7 [util.cc:117] WATCHDOG FAILURE: Thread 0
                                             ABORTABORTABORT
Sep 01 15:26:58 astraea stenographer[10793]: /usr/bin/stenotype(+0xb0d6) [0x55c80320c0d6]
                                             /usr/bin/stenotype(+0x96e2) [0x55c80320a6e2]
                                             /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7f7e626b1b2f]
                                             /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f7e629d4fa3]
                                             /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f7e623914cf]
Sep 01 15:26:58 astraea stenographer[10793]: 2020/09/01 15:26:58 Stenotype stopped after 2m0.559905173s: stenotype wait failed: signal: aborted
Sep 01 15:28:58 astraea stenographer[10793]: 2020-09-01T15:28:58.937431Z T:9bfff7 [util.cc:117] WATCHDOG FAILURE: Thread 0
                                             ABORTABORTABORT
Sep 01 15:28:58 astraea stenographer[10793]: /usr/bin/stenotype(+0xb0d6) [0x55742f5110d6]
                                             /usr/bin/stenotype(+0x96e2) [0x55742f50f6e2]
                                             /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7fca2234fb2f]
                                             /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7fca22672fa3]
                                             /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fca2202f4cf]
Sep 01 15:28:59 astraea stenographer[10793]: 2020/09/01 15:28:59 Stenotype stopped after 2m0.551980822s: stenotype wait failed: signal: aborted

Unsure if related, but over course of execution the kernel log contains many of these:

[ 1864.826642] audit: type=1326 audit(1598971487.263:8): auid=4294967295 uid=106 gid=112 ses=4294967295 subj==unconfined pid=10800 comm="stenotype" exe="/usr/bin/stenotype" sig=31 arch=c000003e syscall=257 compat=0 ip=0x7f8f88a98d0e code=0x0
...
[ 4276.756724] audit: type=1326 audit(1598973899.366:49): auid=4294967295 uid=106 gid=112 ses=4294967295 subj==unconfined pid=11116 comm="stenotype" exe="/usr/bin/stenotype" sig=31 arch=c000003e syscall=231 compat=0 ip=0x7f7e6235e9d6 code=0x0
@satta
Copy link
Contributor

satta commented Sep 2, 2020

Did you install from the Debian buster package (if so, what version?) or did you build stenographer/stenotype yourself?

@dspruell-s01
Copy link
Author

Did you install from the Debian buster package (if so, what version?) or did you build stenographer/stenotype yourself?

Have tried both now. (Didn't realize there was a package in repo, thanks. The top of INSTALL.md leads one to think this is still a future thing). The original post was building myself. I've since retried the setup from the buster package and the result appears the same. Package versions:

ii  stenographer        0.0~git20180422.0.73ce5dd-1+b2 amd64        full-packet-capture utility -- server
ii  stenographer-client 0.0~git20180422.0.73ce5dd-1    all          full-packet-capture utility -- clients

Logging:

Sep 02 19:48:04 astraea systemd[1]: Started packet capture to disk.
Sep 02 19:50:05 astraea stenographer[1786]: 2020-09-02T19:50:05.237657Z T:b55787 [util.cc:117] WATCHDOG FAILURE: Thread 0
                                            ABORTABORTABORT
Sep 02 19:50:05 astraea stenographer[1786]: /usr/sbin/stenotype(+0xb136) [0x558aa0cb3136]
                                            /usr/sbin/stenotype(+0x9756) [0x558aa0cb1756]
                                            /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7f533719ab2f]
                                            /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f53374bdfa3]
                                            /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f5336e7a4cf]
Sep 02 19:50:05 astraea stenographer[1786]: 2020/09/02 19:50:05 Stenotype stopped after 2m0.525674512s: stenotype wait failed: signal: aborted
Sep 02 19:52:05 astraea stenographer[1786]: 2020-09-02T19:52:05.772114Z T:f7fff7 [util.cc:117] WATCHDOG FAILURE: Thread 0
                                            ABORTABORTABORT
Sep 02 19:52:05 astraea stenographer[1786]: /usr/sbin/stenotype(+0xb136) [0x55e69b758136]
                                            /usr/sbin/stenotype(+0x9756) [0x55e69b756756]
                                            /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7faf7e839b2f]
                                            /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7faf7eb5cfa3]
                                            /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7faf7e5194cf]
Sep 02 19:52:05 astraea stenographer[1786]: 2020/09/02 19:52:05 Stenotype stopped after 2m0.539895563s: stenotype wait failed: signal: aborted
Sep 02 19:54:06 astraea stenographer[1786]: 2020-09-02T19:54:06.293815Z T:3e9067 [util.cc:117] WATCHDOG FAILURE: Thread 0
                                            ABORTABORTABORT
Sep 02 19:54:06 astraea stenographer[1786]: /usr/sbin/stenotype(+0xb136) [0x564d672f8136]
                                            /usr/sbin/stenotype(+0x9756) [0x564d672f6756]
                                            /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7f62c0528b2f]
                                            /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f62c084bfa3]
                                            /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f62c02084cf]
Sep 02 19:54:06 astraea stenographer[1786]: 2020/09/02 19:54:06 Stenotype stopped after 2m0.511912014s: stenotype wait failed: signal: aborted

System is currently test environment under VMware ESXi. 8x Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz, 64 GB mem.

# /etc/stenographer/config 
{
  "Threads": [
    { "PacketsDirectory": "/var/lib/stenographer/thread0/packets"
    , "IndexDirectory": "/var/lib/stenographer/thread0/index"
    , "MaxDirectoryFiles": 30000
    , "DiskFreePercentage": 10
    }
  ]
  , "StenotypePath": "/usr/sbin/stenotype"
  , "Interface": "ens192"
  , "Port": 1234
  , "Host": "127.0.0.1"
  , "Flags": []
  , "CertPath": "/etc/stenographer/certs"
}

Capture interface:

ens192: <BROADCAST,MULTICAST,NOARP,ALLMULTI,PROMISC,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:6f:f9:b5 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    19288537   39509    0       0       0       103     
    TX: bytes  packets  errors  dropped carrier collsns 
    1786       23       0       0       0       0

driver: vmxnet3
version: 1.4.16.0-k-NAPI
firmware-version: 
expansion-rom-version: 
bus-info: 0000:0b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

@satta
Copy link
Contributor

satta commented Sep 8, 2020

Have you tried adding "--seccomp=none" to the "Flags" section in the configuration file?
Also, does the interface receive traffic? Just asking because of #94

@dspruell-s01
Copy link
Author

Have you tried adding "--seccomp=none" to the "Flags" section in the configuration file?

I had not tried that, and doing so seemingly resolves this. After restarting with the Flag set ("Flags": ["--seccomp=none"]), I see the process remain running and data files stored.

$ sudo ls -lRA /var/lib/stenographer/
/var/lib/stenographer/:
total 4
drwx------ 4 stenographer stenographer 4096 Sep  2 19:39 thread0

/var/lib/stenographer/thread0:
total 8
drwx------ 2 stenographer stenographer 4096 Sep  8 18:45 index
drwx------ 2 stenographer stenographer 4096 Sep  8 18:45 packets

/var/lib/stenographer/thread0/index:
total 788
-rw------- 1 stenographer stenographer  53558 Sep  8 18:32 1599589913628559
-rw------- 1 stenographer stenographer  52220 Sep  8 18:33 1599589974776521
...

/var/lib/stenographer/thread0/packets:
total 84992
-rw------- 1 stenographer stenographer 5242880 Sep  8 18:32 1599589913628559
-rw------- 1 stenographer stenographer 6291456 Sep  8 18:33 1599589974776521
...

Also, does the interface receive traffic? Just asking because of #94

Probably moot now, but yes - a small trickle from a lab in this case.

Would it be possible to document more information/guidance about the seccomp flag and its impact in Stenographer relating to cases like this?

@gconnell
Copy link
Contributor

gconnell commented Sep 8, 2020

If the switch to --seccomp=none resolved this issue, that almost definitely means that a non-whitelisted Linux syscall is being called as part of the startup of Stenotype. If you'd be interested in finding a complete fix for this, we'd need to do so via an strace of the stenotype binary, to determine exactly which new syscall is being called.

Short description of what's going on:

Seccomp allows a process to say "I'm only going to use the following N syscalls", to define the complete set of behaviors which the Linux kernel should expect and allow from the userspace process. Stenographer (specifically Stenotype) uses this to define boundaries it must abide by in case of compromise due to a maliciously crafted packet. However, unfortunately over time this set of syscalls tends to drift, via changes to glibc, the linux kernel, etc. The list at https://github.com/google/stenographer/blob/master/stenotype/stenotype.cc lines 294-399 are the currently blessed set.

If we can, running stenotype with --seccomp=trace should result in an ENOSYS error which should be findable in strace output. A simple approach to this is to:

  • run as you did when things broke, and when stenotype stalls, run ps auxww | grep stenotype to get the command line arguments it's running with
  • add --seccomp=trace to that, and run with: sudo strace -f -o /tmp/strace_out stenotype <arguments>
  • attach the resulting /tmp/strace_out file to this bug

Or, you can look for the ENOSYS error in /tmp/strace_out and try to trace it back to the specific syscall being made.

@dspruell-s01
Copy link
Author

@gconnell Thanks for the input! I will capture the trace data.

Is that by chance the same thing that the kernel is picking up in the audit logs here?

Sep  2 22:12:46 astraea kernel: [22616.709325] audit: type=1326 audit(1599084766.102:153): auid=4294967295 uid=106 gid=112 ses=4294967295 subj==unconfined pid=2410 comm="stenotype" exe="/usr/sbin/stenotype" sig=31 arch=c000003e syscall=231 compat=0 ip=0x7f88753bc9d6 code=0x0

It repeatedly logged these syscall numbers:

syscall=231
syscall=257

@dspruell-s01
Copy link
Author

This seems to have been the result:

$ sudo strace -f -o /tmp/strace_out /usr/sbin/stenotype --threads=1 --iface=ens192 --dir=/tmp/stenographer447647775 --seccomp=trace
2020-09-08T21:10:35.663205Z T:2245d7 [stenotype.cc:462] CHECK(SUCCEEDED(__check_success_error__)) output.Rotate(file_dirname, micros, flag_preallocate_file_mb << 20): Function not implemented <- open
ABORTABORTABORT
/usr/sbin/stenotype(+0xb136) [0x562b15d20136]
/usr/sbin/stenotype(+0x265af) [0x562b15d3b5af]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbb2f) [0x7f7ea30d9b2f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f7ea33fcfa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f7ea2db94cf]
Segmentation fault

$ echo $?
139

Seems to be these, but attaching full output.

1245  openat(AT_FDCWD, "/tmp/stenographer447647775/PKT0/.1599599435662987", O_WRONLY|O_CREAT|O_DSYNC|O_DIRECT, 0600) = -1 ENOSYS (Function not implemented)
1245  openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 ENOSYS (Function not implemented)

stenotype_strace.txt

@gconnell
Copy link
Contributor

gconnell commented Sep 8, 2020

Ah, perfect. Given that, I expect this is actually the case of Debian being behind head. IIUC, 9c5b80d#diff-0ec2ed7e09c9fb335892ac49999a491e should be the fix necessary to support this.

I don't personally handle rebuilding on Debian, and I'm actually not sure who/what does. But with the idea that maybe tagging head might cause a release to go out, I'm going to throw a new version tag up and see what happens.

@satta
Copy link
Contributor

satta commented Sep 8, 2020

I'm the maintainer of stenographer in Debian, so thanks for the hint! Debian unstable (and also buster-backports) has a stenographer version from May 29, so an update will be in order. Will get it into unstable ASAP, and it will be uploading a backport for buster soon after it hits testing. Unfortunately there can be no updates to buster stable.

@gconnell
Copy link
Contributor

gconnell commented Sep 8, 2020

v1.0.1 should be a good tag to build against. Thanks for Debian-ing!

@dspruell-s01
Copy link
Author

Likewise, thanks guys! And I got to learn a few things as well.

@satta
Copy link
Contributor

satta commented Sep 8, 2020

Oh, I didn't realize there were tagged releases now! Even better, will switch to tracking those -- that would also give us notifications about new upstream releases.

@gconnell
Copy link
Contributor

gconnell commented Sep 8, 2020

I am the worst release tagger known to man, which pretty much means that I tag a release whenever someone opens an Issue to ask me to do so. That said, I'll try to make this a thing that happens ;)

@satta
Copy link
Contributor

satta commented Sep 8, 2020

Good to know! :)

In unstable now, will migrate to testing in ~2 days if all goes well: https://tracker.debian.org/news/1174872/accepted-stenographer-101-1-source-into-unstable/

@satta
Copy link
Contributor

satta commented Oct 3, 2020

1.0.1 is now in testing and buster-backports.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants