Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASL 3 Crash on app_rpt/rpt_channel.c #422

Open
mkmer opened this issue Nov 6, 2024 · 16 comments
Open

ASL 3 Crash on app_rpt/rpt_channel.c #422

mkmer opened this issue Nov 6, 2024 · 16 comments

Comments

@mkmer
Copy link

mkmer commented Nov 6, 2024

Describe the bug
Repeater appears to "crash" randomly.

To Reproduce
Steps to reproduce the behavior:
Not sure what steps cause it to crash - maybe time?
At the "failure" the repeater was NOT active.

Expected behavior
Recovery (which it does appear to do fine)

Screenshots
Logs:

it appears the extirpation of the newkey timer starts the crash (maybe).

[2024-11-03 08:00:12.039] WARNING[604240][C-000001aa] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:4569-3685 on !NEWKEY1!
[2024-11-03 08:00:12.041] WARNING[421560][C-000001aa] channel.c: PBX may not have been terminated properly on 'IAX2/68.58.182.228:4569-3685'
[2024-11-03 08:00:12.059] ERROR[604240][C-000001aa] app_rpt/rpt_channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0)
[2024-11-03 08:00:12.059] ERROR[604240][C-000001aa] : Got 11 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55832e4e0c4d]
# 1: /usr/sbin/asterisk(__ao2_lock+0xac) [0x55832e4e0d5c]
# 2: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x2357c) [0x7fb73962357c]
# 3: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137f2) [0x7fb7396137f2]
# 4: /usr/sbin/asterisk(pbx_exec+0xce) [0x55832e5a373e]
# 5: /usr/sbin/asterisk(+0x124f75) [0x55832e597f75]
# 6: /usr/sbin/asterisk(+0x126d4c) [0x55832e599d4c]
# 7: /usr/sbin/asterisk(+0x1281eb) [0x55832e59b1eb]
# 8: /usr/sbin/asterisk(+0x1afd4d) [0x55832e622d4d]
# 9: /lib/x86_64-linux-gnu/libc.so.6(+0x89144) [0x7fb73d2f9144]
#10: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fb73d3797dc]

[2024-11-03 08:00:12.059] ERROR[604240][C-000001aa] channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0)
[2024-11-03 08:00:12.060] ERROR[604240][C-000001aa] : Got 13 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55832e4e0c4d]
# 1: /usr/sbin/asterisk(__ao2_lock+0xac) [0x55832e4e0d5c]
# 2: /usr/sbin/asterisk(ast_sendtext_data+0x75) [0x55832e519385]
# 3: /usr/sbin/asterisk(ast_sendtext+0x75) [0x55832e519965]
# 4: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x23587) [0x7fb739623587]
# 5: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137f2) [0x7fb7396137f2]
# 6: /usr/sbin/asterisk(pbx_exec+0xce) [0x55832e5a373e]
# 7: /usr/sbin/asterisk(+0x124f75) [0x55832e597f75]
# 8: /usr/sbin/asterisk(+0x126d4c) [0x55832e599d4c]
# 9: /usr/sbin/asterisk(+0x1281eb) [0x55832e59b1eb]
#10: /usr/sbin/asterisk(+0x1afd4d) [0x55832e622d4d]
#11: /lib/x86_64-linux-gnu/libc.so.6(+0x89144) [0x7fb73d2f9144]
#12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fb73d3797dc]

[2024-11-03 08:00:12.060] ERROR[604240][C-000001aa] channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0
)
[2024-11-03 08:00:12.060] ERROR[604240][C-000001aa] : Got 13 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55832e4e0c4d]
# 1: /usr/sbin/asterisk(__ao2_unlock+0x8a) [0x55832e4e0e1a]
# 2: /usr/sbin/asterisk(ast_sendtext_data+0x5bf) [0x55832e5198cf]
# 3: /usr/sbin/asterisk(ast_sendtext+0x75) [0x55832e519965]
# 4: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x23587) [0x7fb739623587]
# 5: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137f2) [0x7fb7396137f2]
# 6: /usr/sbin/asterisk(pbx_exec+0xce) [0x55832e5a373e]
# 7: /usr/sbin/asterisk(+0x124f75) [0x55832e597f75]
# 8: /usr/sbin/asterisk(+0x126d4c) [0x55832e599d4c]
# 9: /usr/sbin/asterisk(+0x1281eb) [0x55832e59b1eb]
#10: /usr/sbin/asterisk(+0x1afd4d) [0x55832e622d4d]
#11: /lib/x86_64-linux-gnu/libc.so.6(+0x89144) [0x7fb73d2f9144]
#12: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fb73d3797dc]

[2024-11-03 08:00:12.060] WARNING[604240][C-000001aa] app_rpt/rpt_channel.c: Failed to send text  on !NEWKEY1!
[2024-11-03 08:00:12.060] ERROR[604240][C-000001aa] app_rpt/rpt_channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0)
[2024-11-03 08:00:12.061] ERROR[604240][C-000001aa] : Got 10 backtrace records
# 0: /usr/sbin/asterisk(+0x6dc4d) [0x55832e4e0c4d]
# 1: /usr/sbin/asterisk(__ao2_unlock+0x8a) [0x55832e4e0e1a]
# 2: /usr/lib/x86_64-linux-gnu/asterisk/modules/app_rpt.so(+0x137f2) [0x7fb7396137f2]
# 3: /usr/sbin/asterisk(pbx_exec+0xce) [0x55832e5a373e]
# 4: /usr/sbin/asterisk(+0x124f75) [0x55832e597f75]
# 5: /usr/sbin/asterisk(+0x126d4c) [0x55832e599d4c]
# 6: /usr/sbin/asterisk(+0x1281eb) [0x55832e59b1eb]
# 7: /usr/sbin/asterisk(+0x1afd4d) [0x55832e622d4d]
# 8: /lib/x86_64-linux-gnu/libc.so.6(+0x89144) [0x7fb73d2f9144]
# 9: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7fb73d3797dc]

[2024-11-03 08:00:17.610] Asterisk 20.9.3+asl3-3.0.6-1.deb12 built by builder @ allstarlink.org on a x86_64 running Linux on 2024-10-05 13:19:13 UTC

Software versions (listed in asl-menu, option 4)

  • asl3: 3.0.6-1
  • asl3-asterisk: Asterisk 20.9.3+asl3-3.0.6-1.deb12

Have you run a software update and rebooted?

  • YES

What is the platform - e.g. Raspberry Pi 4, Raspberry Pi 5, Virtual Machine, Desktop, etc.

  • :Platform
  • Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux

Additional context
Add any other context about the problem here.
I may have truncated a bit more "important" info:

[2024-11-05 21:43:26.318] WARNING[1082641][C-00000009] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:4569-11436 on !NEWKEY1!
[2024-11-05 21:43:26.339] WARNING[1082641][C-00000009] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:4569-11436 on !NEWKEY1!
[2024-11-05 21:43:28.219] WARNING[1082629] app_rpt.c: 0x7f3c74009c20 newkeytimer expired on connected node, setting newkey from 2 to 0.
[2024-11-05 21:43:57.359] WARNING[1082673][C-0000000a] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:4569-5324 on !NEWKEY1!
[2024-11-05 21:43:57.359] WARNING[1082629][C-0000000a] channel.c: PBX may not have been terminated properly on 'IAX2/68.58.182.228:4569-5324'
[2024-11-05 21:43:57.379] ERROR[1082673][C-0000000a] app_rpt/rpt_channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7f3c700016d0 (0)

@mkmer
Copy link
Author

mkmer commented Nov 6, 2024

This may be related to #127 and #274 . They only appeared in our system after the latest release.

@Allan-N
Copy link
Collaborator

Allan-N commented Nov 6, 2024

Can you install the asl3-asterisk-dbgsym and asl3-asterisk-modules-dbgsym packages to your system. This will augment any crash reports with some helpful information.

sudo apt install -y asl3-asterisk-dbgsym asl3-asterisk-modules-dbgsym

@KB4MDD
Copy link
Collaborator

KB4MDD commented Nov 6, 2024

After installing the symbols, please run /var/lib/asterisk/scripts/ast_coredumper /core

The results are recorded in /tmp. Please post them to the issue. This may give us additional information.

@InterLinked1
Copy link
Member

I just looked at this for a second, but the direct cause of the crash seems to be l->chan being NULL here: https://github.com/AllStarLink/app_rpt/blob/master/apps/app_rpt.c#L6761

I don't think it should be possible for it to be NULL there, so in that case there is likely some other bug upstream of that here.

@mkmer
Copy link
Author

mkmer commented Nov 7, 2024

I think the thing crashing is indicated here:

[2024-11-06 19:31:41.574] WARNING[1240639] app_rpt.c: 0x7f4ebc0098d0 newkeytimer expired on connected node, setting newkey from 2 to 0.
[2024-11-06 19:32:00.676] WARNING[1240723][C-0000000a] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:1031-3252 on !NEWKEY1!

when newkey "failed" the next event is "improper termination":

[2024-11-03 08:00:12.041] WARNING[421560][C-000001aa] channel.c: PBX may not have been terminated properly on 'IAX2/68.58.182.228:4569-3685'
[2024-11-03 08:00:12.059] ERROR[604240][C-000001aa] app_rpt/rpt_channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0)

@InterLinked1
Copy link
Member

I think the thing crashing is indicated here:

[2024-11-06 19:31:41.574] WARNING[1240639] app_rpt.c: 0x7f4ebc0098d0 newkeytimer expired on connected node, setting newkey from 2 to 0.
[2024-11-06 19:32:00.676] WARNING[1240723][C-0000000a] app_rpt/rpt_channel.c: Failed to send text IAX2/68.58.182.228:1031-3252 on !NEWKEY1!

when newkey "failed" the next event is "improper termination":

[2024-11-03 08:00:12.041] WARNING[421560][C-000001aa] channel.c: PBX may not have been terminated properly on 'IAX2/68.58.182.228:4569-3685'
[2024-11-03 08:00:12.059] ERROR[604240][C-000001aa] app_rpt/rpt_channel.c: FRACK!, Failed assertion bad magic number 0x0 for object 0x7fb718003b30 (0)

The improper termination event is expected, actually, as some work needs to be done still to get rid of that. The FRACK message is indicative of the problem and in the backtrace lines up with the line number mentioned.

The newkeytimer warning message is also an issue, but I don't recall it always causing a crash in the past, not exactly sure now. They probably are related in some way and we'll take a look into that.

@mkmer
Copy link
Author

mkmer commented Nov 12, 2024

I believe this is also related to #385.
After a bit of testing, it appears any long term links will cause the above crash. Additionally, it appears that after some amount of "ASL uptime", any inbound connection will cause an immediate crash/restart.

@rob297
Copy link

rob297 commented Dec 12, 2024

I am having the same trouble. It only takes about 5 to 6 days and we start getting the errors on new connections to the hub. This started happening on the Raspberry Pi we loaded from the image about 6 months ago. The Pi working fine for several weeks and after an update around 3 months ago we started having the issue. We are now running Debian 12 on a 64bit platform and installed asl3 using the repo. We have roughly 25 to 40 nodes connected at a time. If i disable the hub node 28528, the few connections on the other hubs seem to work fine. However when I enable 28528 (main) and all the connections start trying to establish and its snowballs. Only a reboot will fix the issue.

@KB4MDD
Copy link
Collaborator

KB4MDD commented Dec 12, 2024

@rob297 Please look above for the instructions for installing the symbols and obtaining crash dump information.

@mkmer
Copy link
Author

mkmer commented Dec 12, 2024

It may also be a useful idea to check the system memory (free command) when the events occur. It's on my list to check next time it starts acting up. I'm leaning towards a memory leak of some type that requires a full reboot system reboot to fully "recover".

@rob297
Copy link

rob297 commented Dec 12, 2024

@rob297 Please look above for the instructions for installing the symbols and obtaining crash dump information.

Have it ready for next crash, Will post. Thank you.

@Allan-N
Copy link
Collaborator

Allan-N commented Dec 12, 2024

Have it ready for next crash, Will post. Thank you.

If you added the symbol packages then you should be able to run ast_coredumper on the core file that is already on your system.

@rob297
Copy link

rob297 commented Dec 12, 2024

We have a net tonight so I will probably have it the morning. It seems to happen when there are a lot of inbound connections at one time. I disabled our main hub 28528 and with about 10 connections it seemed to be ok. But when I enabled 28528 and the 27 connections came in it crashed.

I dont have the core file anymore. Yes I know I should have kept it.

@rob297
Copy link

rob297 commented Dec 13, 2024

Here are the output file from the core dump. I included the output log as well. After a full reboot of the server all nodes reconnect without any issues. Thank you for your time.

core-asterisk-2024-12-12T15-00-50Z-info.txt
core-asterisk-2024-12-12T15-00-50Z-locks.txt
core-asterisk-2024-12-12T15-00-50Z-full.txt
core-asterisk-2024-12-12T15-00-50Z-brief.txt
core-asterisk-2024-12-12T15-00-50Z-thread1.txt
output.log

@rob297
Copy link

rob297 commented Dec 26, 2024

Update,
I was able to play around with the number of incoming connections to the system. We currently have 17 repeaters on a VPN along with users from the internet. Around 34 repeaters will connect on a restart. I was able to configure IAX to only allow 1 connection from the subnet of our VPN. All of the internet connections are able to establish. There are some IAX errors but eventually they are able to connect. I update the IAX [callnumberlimits] from 1 to 5 and do an iax reload. Repeaters connect without any issue. I continue a reload letting 5 connect until they are all connected. The system is stable. I now have a way to safely restart the server. Only other option is to connect outbound from the server to the VPN nodes.

Added to iax.conf

[callnumberlimits]
192.168.11.0/255.255.255.0 = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants