Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mssql repeatedly crashing under kernel 6.7.0 #3689

Closed
1 task done
Athanasius opened this issue Jan 20, 2024 · 12 comments · Fixed by #3904
Closed
1 task done

mssql repeatedly crashing under kernel 6.7.0 #3689

Athanasius opened this issue Jan 20, 2024 · 12 comments · Fixed by #3904
Labels

Comments

@Athanasius
Copy link

Steps To Reproduce

I wondered why something else on my home server was suddenly failing this morning. This turned out to be because /home was full. The cause of that was bitwarden mssql container repeatedly crashing and filling the partition with crash dumps/logs.

This only started after I reboot this morning to a 6.7.0 kernel. It's fine back on a 6.6.8 kernel now with the original database files.

It's perfectly possible I made a bad decision on some new 6.7.0 kernel config option.

Expected Result

The mssql container to continue running.

Actual Result

The mssql container repeatedly crashes.

I would have tried restoring a database backup, but couldn't (easily) do that without the container running.

I also tried backing up the live mssql/data directory and then deleting its contents and starting again, same sort of crash resulted.

Screenshots or Videos

No response

Additional Context

This is with a self-compiled 6.7.0 kernel.

Here's the .config for the compile. I couldn't immediately spot any culprit when diffing this with my v6.6.8 .config:
kernel-config-v6.7.0.txt

And the 6.6.8 .config for comparison:
kernel-config-v6.6.8.txt

A crash.json file from one of the dump directories:

{
    "reason": "0x00000001",
    "processName": "sqlservr",
    "pid": "42",
    "instanceId": "dd8cd82b-efd7-43f1-be37-7bfb1811ece1",
    "crashId": "80680a56-1f30-489d-857b-3f8dd7dcbb66",
    "threadState": "0x00007fbe0fcb56a8",
    "threadId": "157",
    "libosThreadId": "0x1e8",
    "buildStamp": "e5dea205d0938e2848fb2509856a7e8f30783e6d5f62d0c88355e288de0db89a",
    "buildNum": "212470",
    "signal": "6",
    "signalText": "SIGABRT",
    "stack": [
        "0x0000558630379dd1",
        "0x00005586303784f0",
        "0x0000558630377af1",
        "0x00007fbe142cb090",
        "0x00007fbe142cb00b",
        "0x00007fbe142aa859",
        "0x00005586302ef692"
    ],
    "stackText": [
        "<unknown>",
        "<unknown>",
        "<unknown>",
        "killpg+0x40",
        "gsignal+0xcb",
        "abort+0x12b",
        "<unknown>"
    ],
    "last_errno": "2",
    "last_errno_text": "No such file or directory",
    "distribution": "Ubuntu 20.04.4 LTS",
    "processors": "8",
    "total_memory": "33547169792",
    "timestamp": "Sat Jan 20 11:14:16 2024"
}

Without the core dump, which is 1.9 GiB, here's one of the dump directories:
core.sqlservr.01_20_2024_11_14_17.42.d-NO_CORE_DUMP.tar.gz

Build Version

2024.1.1

Environment

Self-Hosted

Environment Details

OS: Debian 12.4 (bookworm/stable)
Kernel: 6.7.0 - compiled myself, see above for the .config
bitwarden.sh version 2024.1.1
Docker version 25.0.0, build e758fe5
Docker Compose version v2.24.1

Issue Tracking Info

  • I understand that work is tracked outside of Github. A PR will be linked to this issue should one be opened to address it, but Bitwarden doesn't use fields like "assigned", "milestone", or "project" to track progress.
@Athanasius Athanasius added the bug label Jan 20, 2024
@Athanasius
Copy link
Author

And no sooner do I open this than I see that kernel 6.7.1 is now released. I'll try that soon to see if it helps.

@Athanasius
Copy link
Author

The issue is still present on 6.7.1, but not on 6.6.13 (I figured I might as well end up on the latest 6.6.x if needs be).

@chetwisniewski
Copy link

Same issue. I am using Arch as a host and I downgraded to 6.6.7 and it is working again. This is not specific to BitWarden as it is also reported on the AUR for the mssql package. https://aur.archlinux.org/packages/mssql-server

@Athanasius
Copy link
Author

Same issue. I am using Arch as a host and I downgraded to 6.6.7 and it is working again. This is not specific to BitWarden as it is also reported on the AUR for the mssql package. aur.archlinux.org/packages/mssql-server

Phew, not just me then. I figured it was probably an issue specific to mssql, but chose to report here as it could have been some combination of the newer kernel and configuration options bitwarden utilises.

@sso-bitwarden
Copy link

sso-bitwarden commented Jan 22, 2024

Thank you for your report. It seems like an issue with mssql container with the OS/Kernel. In this case, we could only wait until the fix is available before users could upgrade to the latest kernel.

Regards,

Customer Success Team

@chetwisniewski
Copy link

It's BitWarden's container. So the real issue is will BitWarden file an issue with Microsoft, the LKML or both?

@sso-bitwarden
Copy link

It seems there is already an issue opened on mssql-docker's Github

microsoft/mssql-docker#868

@CryptoSiD
Copy link

The problem is still there with kernel 6.7.2

@chetwisniewski
Copy link

MS seems to only be fixing the current version. So will BitWarden update to SQL '22 or ...?

@sso-bitwarden
Copy link

Bitwarden will run MS SQL 2022 starting the next release.

#3580

@vijaymodha
Copy link

vijaymodha commented Feb 27, 2024

Unfortunately, issue persists after bitwarden/mssql updates to mssql 2022, and with Kernel v6.7.5.

OS Info:

$ uname -r
6.7.5-200.fc39.x86_64
$ cat /etc/fedora-release
Fedora release 39 (Thirty Nine)
$ docker images | grep mssql
bitwarden/mssql                  2024.2.2                 6b98c1e93b1c   6 days ago     1.59GB
mcr.microsoft.com/mssql/server   2019-CU24-ubuntu-20.04   b5316516906d   2 months ago   1.47GB
mcr.microsoft.com/mssql/server   2022-CU11-ubuntu-22.04   ffdd6981a89e   3 months ago   1.58GB
mcr.microsoft.com/mssql/server   2022-CU10-ubuntu-22.04   86b87ec5e60a   3 months ago   1.57GB


Running bitwarden/mssql:2024.2.2:

$ docker run --rm -it -e ACCEPT_EULA=y bitwarden/mssql:2024.2.2
This program has encountered a fatal error and cannot continue running at Tue Feb 27 00:55:28 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 00005571445afce1 std::__1::bad_function_call::~bad_function_call()+0x96661
                 00005571445af6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
                 00005571445aec2f std::__1::bad_function_call::~bad_function_call()+0x955af
                 00007fd817842520 __sigaction+0x50
                 00007fd8178969fc pthread_kill+0x12c
                 00007fd817842476 raise+0x16
                 00007fd8178287f3 abort+0xd3
                 0000557144580d96 std::__1::bad_function_call::~bad_function_call()+0x67716
                 00005571445bd5b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
                 00005571445eb318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
                 00005571445eb0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
                 000055714458720a std::__1::bad_function_call::~bad_function_call()+0x6db8a
                 0000557144586e80 std::__1::bad_function_call::~bad_function_call()+0x6d800
        Process: 44 - sqlservr
         Thread: 135 (application thread 0x184)
    Instance Id: 056fee9f-db2a-48f9-a280-65edf3e521f7
       Crash Id: 6a513c6c-ba13-4311-86a1-45c59829e555
    Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
   Distribution: Ubuntu 22.04.3 LTS
     Processors: 6
   Total Memory: 16764084224 bytes
      Timestamp: Tue Feb 27 00:55:28 2024
     Last errno: 2
Last errno text: No such file or directory
Capturing a dump of 44
Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44
Executing: /opt/mssql/bin/handle-crash.sh with parameters
     handle-crash.sh
     /opt/mssql/bin/sqlservr
     44
     /opt/mssql/bin
     /var/opt/mssql/log/

     056fee9f-db2a-48f9-a280-65edf3e521f7
     6a513c6c-ba13-4311-86a1-45c59829e555

     /var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44

Ubuntu 22.04.3 LTS
Capturing core dump and information to /var/opt/mssql/log...
/bin/cat: /proc/44/maps: Permission denied
^Ccat: /proc/44/environ: Permission denied
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
find: '/proc/44': No such file or directory
dmesg: read kernel buffer failed: Operation not permitted
timeout: failed to run command 'journalctl': No such file or directory
timeout: failed to run command 'journalctl': No such file or directory
Tue Feb 27 00:55:31 UTC 2024 Capturing program information
Dump already generated: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_55_28.44, moving to /var/opt/mssql/log/core.sqlservr.44.temp/core.sqlservr.44.gdmp
Moving logs to /var/opt/mssql/log/core.sqlservr.44.temp/log/paldumper-debug.log
Tue Feb 27 00:55:31 UTC 2024 Capturing program binaries
Tue Feb 27 00:55:31 UTC 2024 Not compressing the dump files, moving instead to: /var/opt/mssql/log/core.sqlservr.02_27_2024_00_55_29.44.d

Running mcr.microsoft.com/mssql/server:2022-CU11-ubuntu-22.04 directly, issue seems to be upstream:

$ docker run --rm -it -e ACCEPT_EULA=y mcr.microsoft.com/mssql/server:2022-CU11-ubuntu-22.04
SQL Server 2022 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
This program has encountered a fatal error and cannot continue running at Tue Feb 27 00:57:11 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 0000563621faece1 std::__1::bad_function_call::~bad_function_call()+0x96661
                 0000563621fae6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
                 0000563621fadc2f std::__1::bad_function_call::~bad_function_call()+0x955af
                 00007efee8242520 __sigaction+0x50
                 00007efee82969fc pthread_kill+0x12c
                 00007efee8242476 raise+0x16
                 00007efee82287f3 abort+0xd3
                 0000563621f7fd96 std::__1::bad_function_call::~bad_function_call()+0x67716
                 0000563621fbc5b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
                 0000563621fea318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
                 0000563621fea0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
                 0000563621f8620a std::__1::bad_function_call::~bad_function_call()+0x6db8a
                 0000563621f85e80 std::__1::bad_function_call::~bad_function_call()+0x6d800
        Process: 9 - sqlservr
         Thread: 99 (application thread 0x180)
    Instance Id: 4f963587-d91e-4f3c-8eca-4e781a1c7ec9
       Crash Id: aed92722-583a-4ef9-9f97-6c1a249ad28f
    Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
   Distribution: Ubuntu 22.04.3 LTS
     Processors: 6
   Total Memory: 16764084224 bytes
      Timestamp: Tue Feb 27 00:57:11 2024
     Last errno: 2
Last errno text: No such file or directory
Capturing a dump of 9
Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9
Executing: /opt/mssql/bin/handle-crash.sh with parameters
     handle-crash.sh
     /opt/mssql/bin/sqlservr
     9
     /opt/mssql/bin
     /var/opt/mssql/log/

     4f963587-d91e-4f3c-8eca-4e781a1c7ec9
     aed92722-583a-4ef9-9f97-6c1a249ad28f

     /var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9

Ubuntu 22.04.3 LTS
Capturing core dump and information to /var/opt/mssql/log...
/bin/cat: /proc/9/maps: Permission denied
^Ccat: /proc/9/environ: No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
find: '/proc/9': No such file or directory
dmesg: read kernel buffer failed: Operation not permitted
timeout: failed to run command 'journalctl': No such file or directory
timeout: failed to run command 'journalctl': No such file or directory
Tue Feb 27 00:57:13 UTC 2024 Capturing program information
Dump already generated: /var/opt/mssql/log/core.sqlservr.2_27_2024_0_57_11.9, moving to /var/opt/mssql/log/core.sqlservr.9.temp/core.sqlservr.9.gdmp
Moving logs to /var/opt/mssql/log/core.sqlservr.9.temp/log/paldumper-debug.log
Tue Feb 27 00:57:14 UTC 2024 Capturing program binaries
Tue Feb 27 00:57:14 UTC 2024 Not compressing the dump files, moving instead to: /var/opt/mssql/log/core.sqlservr.02_27_2024_00_57_12.9.d

@djsmith85 djsmith85 added the upstream An issue with a dependency that needs to get addressed upstream label Feb 27, 2024
@vijaymodha
Copy link

Seems upstream is fixed: microsoft/mssql-docker#868 (comment).

$ uname -r
6.7.9-200.fc39.x86_64
$ docker pull mcr.microsoft.com/mssql/server:2022-CU12-ubuntu-22.04
$ docker run --rm -it -e ACCEPT_EULA=y mcr.microsoft.com/mssql/server:2022-CU12-ubuntu-22.04
...
2024-03-14 21:05:52.25 spid22s     Using 'dbghelp.dll' version '4.0.5'
2024-03-14 21:05:52.31 spid23s     Recovery is complete. This is an informational message only. No user action is required.
2024-03-14 21:05:52.43 spid31s     The default language (LCID 0) has been set for engine and full-text services.
2024-03-14 21:05:52.99 spid31s     The tempdb database has 6 data file(s).
^C2024-03-14 21:06:52.85 spid23s     Always On: The availability replica manager is going offline because SQL Server is shutting down. This is an informational message only. No user action is required.
2024-03-14 21:06:52.86 spid23s     SQL Server shutdown due to Ctrl-C or Ctrl-Break signal. This is an informational message only. No user action is required.
2024-03-14 21:06:53.87 spid23s     SQL Server Agent service is not running.
2024-03-14 21:06:53.88 spid23s     SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.

Fingers crossed for next Bitwarden self-hosted release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants