JSON-RPC Crashes with 2.11 #7532

lazyfrosch · 2019-09-24T08:22:14Z

Task List

Reproduce the problem
- in progress Bisect the commits and deploy each of them into a Terraform/Ansible cloud environment and let them run for 1-3 days until it crashes
- Analyse the crashing commit and create possible patches
- Test patches
Test mitigations - snapshot packages: https://icinga.com/docs/icinga2/latest/doc/21-development/#snapshot-packages-nightly-builds

Mitigations

#7532 (comment)

Update nlohmann/json JSON: Update to nlohmann/json 3.7.3 #7712
Increase coroutine stack size Boost Coroutines: Increase the default stack size from 64 to 256KB #7713
Revert NotificationResult Revert NotificationResult #7737

Analysis

Problem: JSON-RPC Crashes with 2.11 #7532 (comment)
Update: JSON-RPC Crashes with 2.11 #7532 (comment)
Memory Analysis: JSON-RPC Crashes with 2.11 #7532 (comment) & JSON-RPC Crashes with 2.11 #7532 (comment)

Related issues

#7569
#7687
#7624
#7470

ref/NC/636691
ref/NC/644339
ref/NC/644553
ref/NC/647127
ref/NC/652035
ref/NC/652071
ref/NC/652087

Original Report

The setup is a dual master system which was upgraded to 2.11 around noon yesterday.

In the late evening crashes started to appear and are now consistent. The system ran on 2.11-rc1 before.

The user started upgrading agents to 2.11, this may be related.

ref/NC/636691

Latest crash

  Application version: r2.11.0-1

System information:
  Platform: Ubuntu
  Platform version: 18.04.3 LTS (Bionic Beaver)
  Kernel: Linux
  Kernel version: 4.15.0-1050-aws
  Architecture: x86_64

Build information:
  Compiler: GNU 8.3.0
  Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
Stacktrace:

        (0) libc.so.6: gsignal (+0xc7) [0x7f9a99cb4e97]
        (1) libc.so.6: abort (+0x141) [0x7f9a99cb6801]
        (2) libc.so.6: <unknown function> (+0x3039a) [0x7f9a99ca639a]
        (3) libc.so.6: <unknown function> (+0x30412) [0x7f9a99ca6412]
        (4) icinga2: <unknown function> (+0x3656c3) [0x55c11ec5d6c3]
        (5) icinga2: icinga::NotificationComponent::NotificationTimerHandler() (+0x1116) [0x55c11ec77a06]
        (6) icinga2: <unknown function> (+0x6da759) [0x55c11efd2759]
        (7) icinga2: icinga::Timer::Call() (+0x2d) [0x55c11eff151d]
        (8) icinga2: <unknown function> (+0x6f0acd) [0x55c11efe8acd]
        (9) icinga2: boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<icinga::ThreadPool::Post<std::function<void ()> >(std::function<void ()>, icinga::SchedulerPolicy)::{lambda()#1}>, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned long) (+0xc5) [0x55c11ef9e5d5]
        (10) icinga2: <unknown function> (+0x75807b) [0x55c11f05007b]
        (11) icinga2: <unknown function> (+0x6a2555) [0x55c11ef9a555]
        (12) icinga2: boost_asio_detail_posix_thread_function (+0xf) [0x55c11f047f3f]
        (13) libpthread.so.0: <unknown function> (+0x76db) [0x7f9a98a756db]
        (14) libc.so.6: clone (+0x3f) [0x7f9a99d9788f]

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
The program is not being run.

Alternative crashes

        (0) libc.so.6: gsignal (+0xc7) [0x7f32355c0e97]
        (1) libc.so.6: abort (+0x141) [0x7f32355c2801]
        (2) libc.so.6: <unknown function> (+0x89897) [0x7f323560b897]
        (3) libc.so.6: <unknown function> (+0x9090a) [0x7f323561290a]
        (4) libc.so.6: cfree (+0x4dc) [0x7f3235619e2c]
        (5) icinga2: <unknown function> (+0x6d90d1) [0x56119dd120d1]
        (6) icinga2: icinga::JsonRpcConnection::SendMessageInternal(boost::intrusive_ptr<icinga::Dictionary> const&) (+0x4f) [0x56119dc4870f]
        (7) icinga2: <unknown function> (+0x5f6eac) [0x56119dc2feac]
        (8) icinga2: boost::asio::detail::strand_service::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) (+0x75) [0x56119dc8ce85]
        (9) icinga2: <unknown function> (+0x75807b) [0x56119dd9107b]
        (10) icinga2: icinga::IoEngine::RunEventLoop() (+0x5e) [0x56119dd85b1e]
        (11) libstdc++.so.6: <unknown function> (+0xbd66f) [0x7f323349b66f]
        (12) libpthread.so.0: <unknown function> (+0x76db) [0x7f32343816db]
        (13) libc.so.6: clone (+0x3f) [0x7f32356a388f]

The text was updated successfully, but these errors were encountered:

lazyfrosch · 2019-09-24T08:23:53Z

I was not able to run Icinga 2 on GCC or attach to it:

/build/gdb-JPMZNV/gdb-8.1/gdb/dictionary.c:690: internal-error: void insert_symbol_hashed(dictionary*, symbol*): Assertion `SYMBOL_LANGUAGE (sym) == DICT_LANGUAGE (dict)->la_language' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

lazyfrosch · 2019-09-24T08:59:33Z

After disabling the notification feature and some minutes running:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 --no-stack-rlimit daemon --close'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000563ab115b5de in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_local_data (
    this=<optimized out>, this=<optimized out>) at /usr/include/c++/8/ext/new_allocator.h:81
81      /usr/include/c++/8/ext/new_allocator.h: No such file or directory.
[Current thread is 1 (Thread 0x7f72580d4700 (LWP 20323))]
(gdb) bt
#0  0x0000563ab115b5de in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_local_data (
    this=<optimized out>, this=<optimized out>) at /usr/include/c++/8/ext/new_allocator.h:81
#1  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (__str=...,
    this=<optimized out>, this=<optimized out>, __str=...) at /usr/include/c++/8/bits/basic_string.h:542
#2  icinga::String::String (other=..., this=<optimized out>, this=<optimized out>, other=...) at ./lib/base/string.cpp:37
#3  __gnu_cxx::new_allocator<icinga::String>::construct<icinga::String, icinga::String> (this=0x7f7220590c88,
    __p=0x7473756c43273a27) at /usr/include/c++/8/ext/new_allocator.h:136
#4  std::allocator_traits<std::allocator<icinga::String> >::construct<icinga::String, icinga::String> (__a=...,
    __p=0x7473756c43273a27) at /usr/include/c++/8/bits/alloc_traits.h:475
#5  std::vector<icinga::String, std::allocator<icinga::String> >::emplace_back<icinga::String> (this=<optimized out>)
    at /usr/include/c++/8/bits/vector.tcc:103
#6  0x0000563ab108d70f in icinga::JsonRpcConnection::SendMessageInternal(boost::intrusive_ptr<icinga::Dictionary> const&) ()
    at ./lib/remote/jsonrpcconnection.cpp:186
#7  0x0000563ab1074eac in boost::asio::detail::completion_handler<icinga::JsonRpcConnection::SendMessage(boost::intrusive_ptr<icinga::Dictionary> const&)::{lambda()#1}>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) [clone .lto_priv.4649] () at ./lib/remote/jsonrpcconnection.cpp:173
#8  0x0000563ab10d1e85 in boost::asio::detail::strand_service::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () at /usr/include/icinga-boost/boost/asio/detail/scheduler_operation.hpp:40
/build/gdb-JPMZNV/gdb-8.1/gdb/dictionary.c:690: internal-error: void insert_symbol_hashed(dictionary*, symbol*): Assertion `SYMBOL_LANGUAGE (sym) == DICT_LANGUAGE (dict)->la_language' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

bunghi · 2019-09-24T12:32:28Z

I have the exactly same issue. Today i've upgraded from 2.10 to 2.11 and the service started to crash. I also noticed that used memory is way higher than before.
I'm thinking to downgrade..

lazyfrosch · 2019-09-24T12:33:19Z

@bunghi please share logs and details, "exactly" the same won't help us...

bunghi · 2019-09-24T12:49:16Z

The Icinga2 environment looks like this:

web server running icingaweb2
dual node master
13 zones (one endpoint for all but one with 2 endpoints)

This morning we upgraded icinga2 on all 15 servers (masters + zones endpoints). Since then the both endpoints in the zone with 2 crashed (there we have a lot of hosts).

Crash reports:

$ cat report.1569317444.555398
Caught unhandled exception.
Current time: 2019-09-24 11:30:44 +0200

  Application version: r2.11.0-1

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-11-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Error: Function call 'opendir' for file '/var/lib/icinga2/api/zones-stage//global' failed with error code 2, 'No such file or directory'

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***
Failed to launch GDB: No such file or directory

$ cat report.1569312889.121968
Caught unhandled exception.
Current time: 2019-09-24 10:14:49 +0200

  Application version: r2.11.0-1

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-11-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Error: [json.exception.parse_error.101] parse error at line 1, column 28: syntax error while parsing value - unexpected '{'; expected end of input


        (0) icinga2: icinga::JsonDecode(icinga::String const&) (+0x779) [0x55e373ddfac9]
        (1) icinga2: icinga::Process::DoEvents() (+0x295) [0x55e373e33bf5]
        (2) icinga2: icinga::Process::IOThreadProc(int) (+0x3cd) [0x55e373e3529d]
        (3) libstdc++.so.6: <unknown function> (+0xb9e6f) [0x7fa157ae2e6f]
        (4) libpthread.so.0: <unknown function> (+0x74a4) [0x7fa1589194a4]
        (5) libc.so.6: clone (+0x3f) [0x7fa157257d0f]



***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***
Failed to launch GDB: No such file or directory

Since upgraded this morning it crashed 5 times with Out of Memory kernel error.
kern_node1.log
kern_node2.log

Memory usage after upgrade:

bunghi · 2019-09-25T06:20:15Z

I don't know if I should open a different issue for performance degradation after upgrade. Both RAM and CPU usage increased:

dnsmichi · 2019-09-25T07:52:19Z

Memory and CPU usage are expected to raise with the introduction of user land threads with Boost Coroutines. This is different to this issue.

Also, the Json error between main and spawn helper process is a new issue, please move this into a dedicated issue, as requested in https://github.com/Icinga/icinga2/issues/7531#issuecomment-534547311
The error with opendir also is different, new issue please.

Al2Klimov · 2019-09-26T08:55:51Z

@dnsmichi It's not because of the dependencies, otherwise this config would crash:

for (i in range(259)) {
	object Host i {
		check_command = "dummy"
		check_interval = 5s
	}

	object Dependency i use (i) {
		parent_host_name = i
		child_host_name = (i + 1) % 259
		disable_checks = true
	}
}

dnsmichi · 2019-10-08T07:13:20Z

It could be related to JsonEncode seen in various other places as well, maybe related to how memory is allocated and later free'd for encoding dictionaries.

bunghi · 2019-10-24T09:51:09Z

Hi,

Today it crashed again, after a while.. maybe output helps:

$ cat report.1571908186.877839
  Application version: r2.11.1-1

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.9.0-11-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
Stacktrace:

        (0) libc.so.6: gsignal (+0xcf) [0x7f1a2f41dfff]
        (1) libc.so.6: abort (+0x16a) [0x7f1a2f41f42a]
        (2) libc.so.6: <unknown function> (+0x70c00) [0x7f1a2f45bc00]
        (3) libc.so.6: <unknown function> (+0x76fc6) [0x7f1a2f461fc6]
        (4) libc.so.6: <unknown function> (+0x7780e) [0x7f1a2f46280e]
        (5) icinga2: icinga::ObjectImpl<icinga::CheckResult>::~ObjectImpl() (+0x7f) [0x558be62d3e1f]
        (6) icinga2: <unknown function> (+0x55b123) [0x558be6363123]
        (7) icinga2: icinga::Checkable::ProcessCheckResult(boost::intrusive_ptr<icinga::CheckResult> const&, boost::intrusive_ptr<icinga::MessageOrigin> const&) (+0x17a2) [0x558be62f9502]
        (8) icinga2: icinga::ClusterEvents::CheckResultAPIHandler(boost::intrusive_ptr<icinga::MessageOrigin> const&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0xa6c) [0x558be62fbb7c]
        (9) icinga2: std::_Function_handler<icinga::Value (boost::intrusive_ptr<icinga::MessageOrigin> const&, boost::intrusive_ptr<icinga::Dictionary> const&), icinga::Value (*)(boost::intrusive_ptr<icinga::MessageOrigin> const&, boost::intrusive_ptr<icinga::Dictionary> const&)>::_M_invoke(std::_Any_data const&, boost::intrusive_ptr<icinga::MessageOrigin> const&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0x23) [0x558be6205713]
        (10) icinga2: icinga::JsonRpcConnection::MessageHandler(icinga::String const&) (+0x531) [0x558be6151451]
        (11) icinga2: icinga::JsonRpcConnection::HandleIncomingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) (+0x283) [0x558be615ad03]
        (12) icinga2: <unknown function> (+0x43a18d) [0x558be624218d]
        (13) libboost_context.so.1.67.0: make_fcontext (+0x2f) [0x7f1a31dad72f]

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***

Failed to launch GDB: No such file or directory

dnsmichi · 2019-10-25T12:28:41Z

@lippserd @Al2Klimov my suspicion is that this is related to the JSON library with encode/decode, likewise object serialization and a possible leak in there. I haven't run Valgrind yet, but this would be the next thing to try.

Baboon92 · 2019-10-25T12:44:20Z

ref/NC/644339

mwaldmueller · 2019-10-29T12:56:59Z

ref/NC/644553

MarkNReynolds · 2019-11-04T09:21:24Z

I am experiencing exactly the same crash output as @bunghi posted above. I have a 2 node master cluster and around 200 agent instances. Prior to the upgrade to 2.11 the masters were stable, now both nodes crash several times a day.

Build information:
  Compiler: GNU 4.8.5
  Build host: runner-LTrJQZ9N-project-322-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
Stacktrace:

        (0) libc.so.6: gsignal (+0x37) [0x7f3a87108337]
        (1) libc.so.6: abort (+0x148) [0x7f3a87109a28]
        (2) libc.so.6: <unknown function> (+0x78e87) [0x7f3a8714ae87]
        (3) libc.so.6: <unknown function> (+0x7f7c4) [0x7f3a871517c4]
        (4) libc.so.6: <unknown function> (+0x82f00) [0x7f3a87154f00]
        (5) libc.so.6: __libc_malloc (+0x4c) [0x7f3a87157adc]
        (6) libstdc++.so.6: operator new(unsigned long) (+0x1d) [0x7f3a87c32ecd]
        (7) /usr/lib64/icinga2/sbin/icinga2() [0x66093f]
        (8) /usr/lib64/icinga2/sbin/icinga2() [0x6b9c12]
        (9) icinga2: icinga::JsonDecode(icinga::String const&) (+0x4ad) [0x917b2d]
        (10) icinga2: icinga::ApiListener::ReplayLog(boost::intrusive_ptr<icinga::JsonRpcConnection> const&) (+0xa4d) [0xb4497d]
        (11) icinga2: icinga::ApiListener::SyncClient(boost::intrusive_ptr<icinga::JsonRpcConnection> const&, boost::intrusive_ptr<icinga::Endpoint> const&, bool) (+0x61f) [0xb4699f]
        (12) /usr/lib64/icinga2/sbin/icinga2() [0xb47a7a]
        (13) libboost_context.so.1.69.0: make_fcontext (+0x2f) [0x7f3a89bf718f]

***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***

Failed to launch GDB: No such file or directory

I've installed gdb on the masters to see if it provides any useful details.

Mark

carraroj · 2019-11-11T10:36:47Z

ref/NC/647127

Napsty · 2019-11-15T06:48:10Z

Yesterday (Nov 14th 2019) we experienced the same crash as @bunghi mentioned.
Dual master setup here, 4 zones.

Both masters run 2.11.2-1.xenial.

Caught unhandled exception.
Current time: 2019-11-14 15:30:05 +0100

  Application version: r2.11.2-1

System information:
  Platform: Ubuntu
  Platform version: 16.04.6 LTS (Xenial Xerus)
  Kernel: Linux
  Kernel version: 4.4.0-101-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 5.4.0
  Build host: runner-LTrJQZ9N-project-298-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Error: [json.exception.parse_error.101] parse error at line 1, column 27: syntax error while parsing value - unexpected '{'; expected end of input


        (0) icinga2: icinga::JsonDecode(icinga::String const&) (+0xcb2) [0x5fca02]
        (1) icinga2: icinga::Process::DoEvents() (+0x27c) [0x63b85c]
        (2) icinga2: icinga::Process::IOThreadProc(int) (+0x3b7) [0x6400a7]
        (3) libstdc++.so.6: <unknown function> (+0xb8c80) [0x7fb42e48ec80]
        (4) libpthread.so.0: <unknown function> (+0x76ba) [0x7fb42f0456ba]
        (5) libc.so.6: clone (+0x6d) [0x7fb43035041d]



***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://github.com/Icinga/icinga2 and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***
Failed to launch GDB: No such file or directory

Al2Klimov · 2020-02-25T10:40:50Z

I'll take care of that.

Al2Klimov · 2020-02-25T10:59:20Z

Testing e930efd and 699047e – just to be sure. I'll give you an update on thursday.

Al2Klimov · 2020-02-27T17:12:23Z

e930efd has purred like a cat for two days. Green light for v2.12rc1.

davekempe · 2020-03-02T19:59:45Z

I have a similar crash with a large number of endpoints that have been removed from the config but are still attempting to connect. There is about 400 endpoints hammering away and icinga2 can only stay up for around 20hours until I get this error:

icinga2: /usr/include/icinga-boost/boost/smart_ptr/intrusive_ptr.hpp:199: T* boost::intrusive_ptr<T>::operator->() const [with T = icinga::Endpoint]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2020-03-02 10:15:08 +0000

Will the latest snapshot discussed here fix this? Or should I lodge a separate bug report.
The de-configured endpoints will be re-enabled shortly, so I expect the problem to go away, but its clear that many endpoints being rejected shouldn't be grounds for a crash.

This is a sample of the debuglog:

[2020-03-01 22:28:48 +0000] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from identity 'fp-mlb-au.example.com'.
[2020-03-01 22:28:48 +0000] notice/ClusterEvents: Discarding 'next check changed' message from 'fp-mlb-au.example.com': Invalid endpoint origin (client not allowed)

These messages happen over and over - the debug log gets very large quickly. There are about 10000 hosts behind the collective endpoints by the way, not sure that makes a difference, but some of those endpoints would have a ton of updates not sent.

Al2Klimov · 2020-03-03T09:11:17Z

@davekempe Please try v2.11.3 once it has been released (later today).

Al2Klimov · 2020-03-06T10:19:12Z

Please could all of you test v2.11.3 and tell whether it has fixed your particular problem?

davekempe · 2020-03-06T10:24:40Z

Hey sorry I was going to get back bug the big was closed. Happy to report the issue is fixed. I was able to simulate the problem reliably as it happened every 24 hours in our environment if we removed the endpoints via automation. After the update it has been fine with no crashes.

hardoverflow · 2020-03-06T12:03:03Z

Unfortunately, we still have the problem.

[2020-03-04 09:43:14 +0100] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'host.abc.com'
Error: Connection reset by peer


        (0) icinga2: icinga::JsonRpc::SendRawMessage(std::shared_ptr<icinga::AsioTlsStream> const&, icinga::String const&, boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::exe
        (1) icinga2: icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) (+0x231) [0xb64351]
        (2) /usr/lib64/icinga2/sbin/icinga2() [0xb6477a]
        (3) /usr/lib64/icinga2/sbin/icinga2() [0xb64c58]
        (4) libboost_context.so.1.69.0: make_fcontext (+0x2f) [0x7f81b595b18f]



[2020-03-04 09:43:14 +0100] warning/JsonRpcConnection: API client disconnected for identity 'host.abc.com'

Al2Klimov · 2020-03-06T12:04:43Z

@hardoverflow ... and it still crashes?

mcktr · 2020-03-06T17:59:42Z

@Al2Klimov @N-o-X Was the master branch (snapshot packages) affected of this bug? If so the master branch is currently not fixed, since the fixing changes are directly merged into the support/2.11 branch.

The master branch should be tested prior a 2.12 release to ensure the bug is fixed there too.

N-o-X · 2020-03-09T08:36:56Z

@mcktr every change has also been merged into the master branch: #7841 #7837 #7737

Al2Klimov · 2020-03-09T10:41:11Z

... and tested successfully.

hardoverflow · 2020-03-17T09:25:42Z

@Al2Klimov Once the cluster is running, it is also running. The error occurs sporadically when deploying. After a while, the ConfigMaster crashed. The second master is still running.

● icinga2.service - Icinga host/service/network monitoring system
   Loaded: loaded (/usr/lib/systemd/system/icinga2.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2020-03-17 10:14:15 CET; 3min 29s ago
  Process: 34392 ExecReload=/usr/lib/icinga2/safe-reload /etc/sysconfig/icinga2 (code=exited, status=0/SUCCESS)
  Process: 30207 ExecStart=/usr/sbin/icinga2 daemon --close-stdio -e ${ICINGA2_ERROR_LOG} (code=exited, status=139)
 Main PID: 30207 (code=exited, status=139)

Mar 17 07:11:03 master-a.fqdn.de safe-reload[18089]: Validating config files: Done
Mar 17 07:11:03 master-a.fqdn.de safe-reload[18089]: Reloading Icinga 2: Done
Mar 17 07:12:00 master-a.fqdn.de systemd[1]: Reloaded Icinga host/service/network monitoring system.
Mar 17 10:06:47 master-a.fqdn.de systemd[1]: Reloading Icinga host/service/network monitoring system.
Mar 17 10:07:12 master-a.fqdn.de safe-reload[34392]: Validating config files: Done
Mar 17 10:07:12 master-a.fqdn.de safe-reload[34392]: Reloading Icinga 2: Done
Mar 17 10:08:16 master-a.fqdn.de systemd[1]: Reloaded Icinga host/service/network monitoring system.
Mar 17 10:14:15 master-a.fqdn.de systemd[1]: icinga2.service: main process exited, code=exited, status=139/n/a
Mar 17 10:14:15 master-a.fqdn.de systemd[1]: Unit icinga2.service entered failed state.
Mar 17 10:14:15 master-a.fqdn.de systemd[1]: icinga2.service failed.

We also observe the following network bandwidth for masters and satellites.

Al2Klimov · 2020-03-17T09:29:12Z

Please share the output of icinga2 --version ran on the crashed node.

hardoverflow · 2020-03-17T09:34:41Z

icinga2 - The Icinga 2 network monitoring daemon (version: 2.11.3-1)

Copyright (c) 2012-2020 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-1062.4.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: runner-LTrJQZ9N-project-322-concurrent-0

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
[[email protected]]#

Al2Klimov · 2020-03-17T09:36:45Z

Damn. It's actually v2.11.3.

@lippserd @N-o-X @htriem We've got a (hopefully not so) big problem.

lippserd · 2020-03-17T10:32:40Z

@hardoverflow Could you please upload core dumps here

hardoverflow · 2020-03-18T09:11:42Z

@lippserd Done. Can u confirm?

Al2Klimov · 2020-03-18T09:39:45Z

Confirmed.

All of you: If you give us core dumps, please gzip them – and if you request them, request to gzip them. Not neccessarily all of us have enterprise downlinks due to COVID19.

This includes the following fixes: nlohmann/json#1436 > For a deeply-nested JSON object, the recursive implementation of json_value::destroy function causes stack overflow. nlohmann/json#1708 nlohmann/json#1722 Stack size nlohmann/json#1693 (comment) Integer Overflow nlohmann/json#1447 UTF8, json dump out of bounds nlohmann/json#1445 Possibly influences #7532

lazyfrosch added the ref/NC label Sep 24, 2019

dnsmichi mentioned this issue Oct 2, 2019

icinga2 failes after start as result of high load #7549

Closed

dnsmichi added area/distributed Distributed monitoring (master, satellites, clients) core/crash Shouldn't happen, requires attention labels Oct 2, 2019

dnsmichi self-assigned this Oct 8, 2019

dnsmichi added this to the 2.12.0 milestone Oct 8, 2019

dnsmichi added the blocker Blocks a release or needs immediate attention label Oct 17, 2019

dnsmichi assigned Al2Klimov and htriem and unassigned dnsmichi Nov 11, 2019

dnsmichi unassigned htriem Nov 22, 2019

lazyfrosch mentioned this issue Dec 3, 2019

Icinga 2 reconnects in a loop for self-signed certificates #7680

Closed

dnsmichi self-assigned this Dec 6, 2019

Al2Klimov unassigned dnsmichi, lippserd and N-o-X Feb 25, 2020

N-o-X closed this as completed in #7837 Mar 3, 2020

N-o-X modified the milestones: 2.12.0, 2.11.3 Mar 3, 2020

Al2Klimov unpinned this issue Mar 3, 2020

Al2Klimov mentioned this issue Mar 18, 2020

JsonRpcConnection#HandleAndWriteHeartbeats(): check !!#m_Endpoint #7926

Merged

lippserd mentioned this issue Apr 30, 2020

icinga 2.11 exits without errors on FreeBSD 11.3 #7539

Open

powelleb mentioned this issue Jun 16, 2020

2.12 RC31...11 stuck on reload Ubuntu 18.04 #8021

Closed

JSON-RPC Crashes with 2.11 #7532

JSON-RPC Crashes with 2.11 #7532

Comments

lazyfrosch commented Sep 24, 2019 • edited by Al2Klimov Loading

Task List

Mitigations

Analysis

Related issues

Original Report

Latest crash

Alternative crashes

lazyfrosch commented Sep 24, 2019

lazyfrosch commented Sep 24, 2019

bunghi commented Sep 24, 2019

lazyfrosch commented Sep 24, 2019

bunghi commented Sep 24, 2019

bunghi commented Sep 25, 2019

dnsmichi commented Sep 25, 2019

Al2Klimov commented Sep 26, 2019

dnsmichi commented Oct 8, 2019

bunghi commented Oct 24, 2019

dnsmichi commented Oct 25, 2019

Baboon92 commented Oct 25, 2019

mwaldmueller commented Oct 29, 2019

MarkNReynolds commented Nov 4, 2019

carraroj commented Nov 11, 2019

Napsty commented Nov 15, 2019

Al2Klimov commented Feb 25, 2020

Al2Klimov commented Feb 25, 2020

Al2Klimov commented Feb 27, 2020

davekempe commented Mar 2, 2020

Al2Klimov commented Mar 3, 2020 • edited Loading

Al2Klimov commented Mar 6, 2020

davekempe commented Mar 6, 2020

hardoverflow commented Mar 6, 2020

Al2Klimov commented Mar 6, 2020

mcktr commented Mar 6, 2020

N-o-X commented Mar 9, 2020

Al2Klimov commented Mar 9, 2020

hardoverflow commented Mar 17, 2020

Al2Klimov commented Mar 17, 2020

hardoverflow commented Mar 17, 2020

Al2Klimov commented Mar 17, 2020

lippserd commented Mar 17, 2020

hardoverflow commented Mar 18, 2020

Al2Klimov commented Mar 18, 2020 • edited Loading

lazyfrosch commented Sep 24, 2019 •

edited by Al2Klimov

Loading

Al2Klimov commented Mar 3, 2020 •

edited

Loading

Al2Klimov commented Mar 18, 2020 •

edited

Loading