Skip to content

Conversation

wmasilva
Copy link
Contributor

try to fix #2529

Passing the member_session to the originate, when the dialstring is return to the originate it hangups the call, not sure if is the right solution, but it solves the issue about making the call to the agent.


2024-07-11 18:10:00.706322 99.20% [INFO] switch_cpp.cpp:1466 fetching dialstring for [email protected] 

2024-07-11 18:10:00.706322 99.20% [INFO] switch_cpp.cpp:1466 user.lua start sleep
2024-07-11 18:10:01.546299 99.17% [DEBUG] switch_rtp.c:1930 rtcp_stats_init: audio ssrc[1649471371] base_seq[0]
2024-07-11 18:10:01.546299 99.17% [DEBUG] switch_rtp.c:7695 Correct audio ip/port confirmed.
2024-07-11 18:10:01.566796 99.17% [DEBUG] sofia.c:7493 Channel sofia/main/[email protected] entering state [ready][200]
2024-07-11 18:10:04.166829 99.00% [NOTICE] sofia.c:1065 Hangup sofia/main/[email protected] [CS_EXECUTE] [NORMAL_CLEARING]
2024-07-11 18:10:04.166829 99.00% [DEBUG] mod_hash.c:293 Usage for total_account:[email protected] is now 0
2024-07-11 18:10:04.166829 99.00% [DEBUG] mod_callcenter.c:3262 Member 1002 <1002> abandoned waiting in queue prueba
2024-07-11 18:10:04.166829 99.00% [DEBUG] mod_callcenter.c:3295 Member "1002" <1002> exit queue prueba due to BREAK_OUT
2024-07-11 18:10:04.166829 99.00% [DEBUG] mod_callcenter.c:3390 Queue has 0 waiting calls.
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_session.c:2979 sofia/main/[email protected] skip receive message [APPLICATION_EXEC_COMPLETE] (channel is hungup already)
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:647 (sofia/main/[email protected]) State EXECUTE going to sleep
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:581 (sofia/main/[email protected]) Running State Change CS_HANGUP (Cur 1 Tot 9)
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:844 (sofia/main/[email protected]) Callstate Change RING_WAIT -> HANGUP
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:846 (sofia/main/[email protected]) State HANGUP
2024-07-11 18:10:04.166829 99.00% [DEBUG] mod_sofia.c:469 Channel sofia/main/[email protected] hanging up, cause: NORMAL_CLEARING
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:59 sofia/main/[email protected] Standard HANGUP, cause: NORMAL_CLEARING
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:846 (sofia/main/[email protected]) State HANGUP going to sleep
2024-07-11 18:10:04.166829 99.00% [INFO] switch_cpp.cpp:1466 execute on hangup 
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:777 Hangup Command with no Session lua(execute_on_hangup.lua):

2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:616 (sofia/main/[email protected]) State Change CS_HANGUP -> CS_REPORTING
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:581 (sofia/main/[email protected]) Running State Change CS_REPORTING (Cur 1 Tot 9)
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:932 (sofia/main/[email protected]) State REPORTING
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:168 sofia/main/[email protected] Standard REPORTING, cause: NORMAL_CLEARING
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:932 (sofia/main/[email protected]) State REPORTING going to sleep
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_state_machine.c:607 (sofia/main/[email protected]) State Change CS_REPORTING -> CS_DESTROY
2024-07-11 18:10:04.166829 99.00% [DEBUG] switch_core_session.c:1744 Session 9 (sofia/main/[email protected]) Locked, Waiting on external entities
2024-07-11 18:10:10.686908 98.97% [INFO] switch_cpp.cpp:1466 user.lua done sleep
2024-07-11 18:10:10.686908 98.97% [INFO] switch_cpp.cpp:1466 user.lua dialstring [cm_limit_id=account:[email protected],cm_limit_total=4,cm_limit_inbound=-1,cm_limit_outbound=-1,cm_limit_busy=-1]sofia/main/sip:[email protected]:5064;fs_nat=yes 

2024-07-11 18:10:10.686908 98.97% [DEBUG] switch_cpp.cpp:1210 sofia/main/[email protected] destroy/unlink session from object
2024-07-11 18:10:10.726908 98.97% [DEBUG] switch_ivr_originate.c:2301 Parsing global variables
2024-07-11 18:10:10.726908 98.97% [DEBUG] switch_ivr_originate.c:2863 Parsing session specific variables
2024-07-11 18:10:10.726908 98.97% [DEBUG] switch_ivr_originate.c:4056 Originate Resulted in Error Cause: 16 [NORMAL_CLEARING]
2024-07-11 18:10:10.726908 98.97% [NOTICE] switch_ivr_originate.c:3059 Cannot create outgoing channel of type [user] cause: [NORMAL_CLEARING]
2024-07-11 18:10:10.726908 98.97% [DEBUG] switch_ivr_originate.c:4056 Originate Resulted in Error Cause: 16 [NORMAL_CLEARING]
2024-07-11 18:10:10.726908 98.97% [DEBUG] mod_callcenter.c:2139 Agent agent-1000 Origination Canceled : NORMAL_CLEARING

attach the call log after applying the change.
freeswitch_afterpatch.log

I notice in past commits that the member_session was included in the originate and then removed:
included:
b4ada1b
removed:
ec44f5a

i try to reproduce the issue "Do not kick member out on timeout if originating to an agent" after my change, and i couldn't, the call is ok.

@wmasilva
Copy link
Contributor Author

wmasilva commented Jul 12, 2024

doing more test, i found a scenario that is not solved:

callcenter strategy: ring-all , 2 agents.
call 2 agents: 1 agent/user i have timeout, 2 agent/user don't have timeout (will ring immediately). 2 agent answer the call. the 1 agent will received the call after the timeout and when answering is alone (zombie call).

attache call log:
freeswitch_zombie_afteranswercall.log

I could in my lua script when calling the 1 agent, after the timeout, check if the call is already answered and hangup/cancel the originate.

i think the best solution would be implemented in callcenter a way of detecting hangup or answered and cancel the originate if is not returning. any idea?

Thanks

@wmasilva
Copy link
Contributor Author

I understood how the call is hangup when other agent answers the call:

"switch_core_session_hupall_matching_var("cc_member_pre_answer_uuid", h->member_uuid, SWITCH_CAUSE_LOSE_RACE);"

In this scenario the call is not canceled because there is not yet a channel to hangup... FS is waiting in the originate and when the sleep is over the originate continues the execution.. creating the channel and placing the call to the agent.

@wmasilva
Copy link
Contributor Author

with my last commit, it hangup the call if it is already connected.
Is there a way of cancelling the switch_ivr_originate?

@wmasilva
Copy link
Contributor Author

wmasilva commented May 12, 2025

recently with this change i got another issue that could be related... but not sure.

Using ring-all strategic sometimes the call is place twice to the same destination ignoring the agent dialstring.
To track the issue, added to mod_callcenter.c an extra log with agent's dialstring, so i got:

304283f9-0890-4317-8e39-3523ee396ce4 2025-05-07 09:56:02.409964 95.07% [DEBUG] mod_callcenter.c:2024 Make the call to agent cc-callcenter_agent-Agent2 originate dialstr: [network_addr=127.0.0.1,leg_timeout=30,ignore_display_updates=true][^^:export_vars=ignore_display_updates]user/[email protected]

304283f9-0890-4317-8e39-3523ee396ce4 2025-05-07 09:56:02.409964 95.07% [DEBUG] mod_callcenter.c:2024 Make the call to agent cc-callcenter_agent-Agent4 originate dialstr: [network_addr=127.0.0.1,leg_timeout=30,ignore_display_updates=true][^^:export_vars=ignore_display_updates]user/[email protected]

Mod_callcenter is reading correctly the dial string, but in switch_originate i got the debug from my script with the same userid:

2025-05-07 09:56:02.409964 95.07% [INFO] switch_cpp.cpp:1466 304283f9-0890-4317-8e39-3523ee396ce4 /scripts/fs/user.lua:34 user.lua: starting (2100,mad.local)

2025-05-07 09:56:02.409964 95.07% [INFO] switch_cpp.cpp:1466 304283f9-0890-4317-8e39-3523ee396ce4 /scripts/fs/user.lua:34 user.lua: starting (2100,mad.local)

EDIT: directory and user.lua script:

<param name="dial-string" value="${lua(user.lua ${dialed_user} ${dialed_domain})}" />

-- globals
userid = argv[1]
domain = argv[2]

if ((userid == nil) or (domain == nil)) then
        cm.error("missing variables userid or domain")
        return
end

debug.info("user.lua: starting (" .. userid .. "," .. domain .. ")")

Any idea on how can i trace this in switch_ivr_originate?

Thanks.

@wmasilva
Copy link
Contributor Author

wmasilva commented Sep 3, 2025

I think I found the root cause of the problem.
When the session is used in switch_originate_ivr from mod_callcenter, the channel variables dialer_user and dialed_domain are being replaced when resolving the user endpoint inside the function user_outgoing_channel.

I’m testing the following patch to see if it solves the issue.


diff --git a/src/mod/applications/mod_dptools/mod_dptools.c b/src/mod/applications/mod_dptools/mod_dptools.c
index 23fd2125ef..7fe369e99e 100644
--- a/src/mod/applications/mod_dptools/mod_dptools.c
+++ b/src/mod/applications/mod_dptools/mod_dptools.c
@@ -4432,16 +4432,19 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
                }
 
                if (session) {
-                       channel = switch_core_session_get_channel(session);
-                       if ((varval = switch_channel_get_variable(channel, SWITCH_CALL_TIMEOUT_VARIABLE))
-                               || (var_event && (varval = switch_event_get_header(var_event, "leg_timeout")))) {
-                               timelimit = atoi(varval);
-                       }
+                       if (switch_core_session_read_lock(session) == SWITCH_STATUS_SUCCESS) {
+                               channel = switch_core_session_get_channel(session);
+                               if ((varval = switch_channel_get_variable(channel, SWITCH_CALL_TIMEOUT_VARIABLE))
+                                       || (var_event && (varval = switch_event_get_header(var_event, "leg_timeout")))) {
+                                       timelimit = atoi(varval);
+                               }
 
-                       switch_channel_set_variable(channel, "dialed_user", dialed_user);
-                       switch_channel_set_variable(channel, "dialed_domain", domain);
+                               switch_channel_set_variable(channel, "dialed_user", dialed_user);
+                               switch_channel_set_variable(channel, "dialed_domain", domain);
 
-                       d_dest = switch_channel_expand_variables(channel, dest);
+                               d_dest = switch_channel_expand_variables(channel, dest);
+                               switch_core_session_rwunlock(session);
+                       }
 
                } else {
                        switch_event_t *event = NULL;

@wmasilva
Copy link
Contributor Author

wmasilva commented Sep 10, 2025

The changes I made in mod_dptools are not working as expected. The channel variable gets replaced, and when the user dialstring is expanded, the dialed_user ends up using the previous (replaced) value, which is incorrect.

I've put some logs to see the behaviour:


3b7721e7-e4eb-4046-8a40-fa411fa6bf65 2025-09-10 13:16:14.879956 91.20% [WARNING] mod_dptools.c:4417 user_outgoing_channel dialed_user 2103
3b7721e7-e4eb-4046-8a40-fa411fa6bf65 2025-09-10 13:16:14.879956 91.20% [WARNING] mod_dptools.c:4417 user_outgoing_channel dialed_user 2100
3b7721e7-e4eb-4046-8a40-fa411fa6bf65 2025-09-10 13:16:14.879956 91.20% [WARNING] mod_dptools.c:4417 user_outgoing_channel dialed_user 2104

2025-09-10 13:16:14.879956 91.20% [INFO] switch_cpp.cpp:1466 3b7721e7-e4eb-4046-8a40-fa411fa6bf65 /scripts/fs/user.lua:34 user.lua: starting (2103,mad.local)
2025-09-10 13:16:14.879956 91.20% [INFO] switch_cpp.cpp:1466 3b7721e7-e4eb-4046-8a40-fa411fa6bf65 /scripts/fs/user.lua:34 user.lua: starting (2104,mad.local)
2025-09-10 13:16:14.879956 91.20% [INFO] switch_cpp.cpp:1466 3b7721e7-e4eb-4046-8a40-fa411fa6bf65 /scripts/fs/user.lua:34 user.lua: starting (2104,mad.local)

Any idea how to solve this and keep the session in the originate from mod_callcenter.c
Thanks.

@wmasilva
Copy link
Contributor Author

diff --git a/src/mod/applications/mod_dptools/mod_dptools.c b/src/mod/applications/mod_dptools/mod_dptools.c
index 23fd2125ef..e84376da47 100644
--- a/src/mod/applications/mod_dptools/mod_dptools.c
+++ b/src/mod/applications/mod_dptools/mod_dptools.c
@@ -3687,6 +3687,7 @@ static struct {
 	switch_mutex_t *pickup_mutex;
 	switch_hash_t *mutex_hash;
 	switch_mutex_t *mutex_mutex;
+	switch_mutex_t *user_mutex;
 } globals;
 
 /* pickup channel */
@@ -4348,11 +4349,16 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 		goto done;
 	}
 
+	switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_WARNING, "user_outgoing_channel destination_number %s\n", outbound_profile->destination_number);
+
 	user = strdup(outbound_profile->destination_number);
 
 	if (!user)
 		goto done;
 
+	switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_WARNING, "user_outgoing_channel user %s\n", user);
+
+
 	if ((domain = strchr(user, '@'))) {
 		*domain++ = '\0';
 	} else {
@@ -4407,6 +4413,9 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 
 	dialed_user = (char *)switch_xml_attr(x_user, "id");
 
+	switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_WARNING, "user_outgoing_channel dialed_user %s\n", dialed_user);
+
+
 	if (var_event) {
 		switch_event_add_header_string(var_event, SWITCH_STACK_BOTTOM, "dialed_user", dialed_user);
 		switch_event_add_header_string(var_event, SWITCH_STACK_BOTTOM, "dialed_domain", domain);
@@ -4432,6 +4441,7 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 		}
 
 		if (session) {
+			switch_mutex_lock(globals.user_mutex);
 			channel = switch_core_session_get_channel(session);
 			if ((varval = switch_channel_get_variable(channel, SWITCH_CALL_TIMEOUT_VARIABLE))
 				|| (var_event && (varval = switch_event_get_header(var_event, "leg_timeout")))) {
@@ -4442,6 +4452,8 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 			switch_channel_set_variable(channel, "dialed_domain", domain);
 
 			d_dest = switch_channel_expand_variables(channel, dest);
+			switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_WARNING, "user_outgoing_channel session d_dest %s\n", d_dest);
+			switch_mutex_unlock(globals.user_mutex);
 
 		} else {
 			switch_event_t *event = NULL;
@@ -4473,6 +4485,7 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 			myflags |= SOF_NOBLOCK;
 		}
 
+switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_WARNING, "user_outgoing_channel final d_dest %s\n", d_dest);
 
 		switch_snprintf(stupid, sizeof(stupid), "user/%s", dialed_user);
 		if (switch_stristr(stupid, d_dest)) {
@@ -6521,6 +6534,7 @@ SWITCH_MODULE_SHUTDOWN_FUNCTION(mod_dptools_shutdown)
 	switch_mutex_destroy(globals.pickup_mutex);
 	switch_core_hash_destroy(&globals.pickup_hash);
 	switch_mutex_destroy(globals.mutex_mutex);
+	switch_mutex_destroy(globals.user_mutex);
 	switch_core_hash_destroy(&globals.mutex_hash);
 
 	return SWITCH_STATUS_SUCCESS;
@@ -6560,6 +6574,8 @@ SWITCH_MODULE_LOAD_FUNCTION(mod_dptools_load)
 	switch_core_hash_init(&globals.mutex_hash);
 	switch_mutex_init(&globals.mutex_mutex, SWITCH_MUTEX_NESTED, globals.pool);
 
+	switch_mutex_init(&globals.user_mutex, SWITCH_MUTEX_NESTED, globals.pool);
+
 	/* connect my internal structure to the blank pointer passed to me */
 	*module_interface = switch_loadable_module_create_module_interface(pool, modname);

@wmasilva
Copy link
Contributor Author

Using a global mutex this actually work, but it could be a bad idea... about using a new channel mutex, what do you think?

I'm testing this version:


diff --git a/src/mod/applications/mod_dptools/mod_dptools.c b/src/mod/applications/mod_dptools/mod_dptools.c
index 23fd2125ef..66f1f2913c 100644
--- a/src/mod/applications/mod_dptools/mod_dptools.c
+++ b/src/mod/applications/mod_dptools/mod_dptools.c
@@ -4316,6 +4316,32 @@ static switch_call_cause_t group_outgoing_channel(switch_core_session_t *session
 }
 
 
+typedef struct {
+    switch_mutex_t *outgoing_mutex;
+} mod_dptools_session_t;
+
+
+static mod_dptools_session_t *user_outgoing_get_session_data(switch_core_session_t *session)
+{
+    switch_channel_t *channel = switch_core_session_get_channel(session);
+    mod_dptools_session_t *sdata = NULL;
+
+    if (!channel) {
+        return NULL;
+    }
+
+    sdata = (mod_dptools_session_t *) switch_channel_get_private(channel, "mod_dptools.session");
+
+    if (!sdata) {
+        /* allocate in session pool so it's auto-freed */
+        sdata = (mod_dptools_session_t *) switch_core_session_alloc(session, sizeof(*sdata));
+        sdata->outgoing_mutex = NULL;
+        switch_mutex_init(&sdata->outgoing_mutex, SWITCH_MUTEX_NESTED, switch_core_session_get_pool(session));
+        switch_channel_set_private(channel, "mod_dptools.session", sdata);
+    }
+
+    return sdata;
+}
 
 /* fake chan_user */
 switch_endpoint_interface_t *user_endpoint_interface;
@@ -4425,6 +4451,7 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
                switch_originate_flag_t myflags = SOF_NONE;
                char *cid_name_override = NULL;
                char *cid_num_override = NULL;
+               mod_dptools_session_t *sdata = NULL;
 
                if (var_event) {
                        cid_name_override = switch_event_get_header(var_event, "origination_caller_id_name");
@@ -4433,6 +4460,13 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
 
                if (session) {
                        channel = switch_core_session_get_channel(session);
+                       sdata = user_outgoing_get_session_data(session);
+
+                       if (sdata && sdata->outgoing_mutex) {
+                               switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG, "User/%s@%s locking outgoing mutex\n", dialed_user, domain);
+                               switch_mutex_lock(sdata->outgoing_mutex);
+                       }
+
                        if ((varval = switch_channel_get_variable(channel, SWITCH_CALL_TIMEOUT_VARIABLE))
                                || (var_event && (varval = switch_event_get_header(var_event, "leg_timeout")))) {
                                timelimit = atoi(varval);
@@ -4442,6 +4476,12 @@ static switch_call_cause_t user_outgoing_channel(switch_core_session_t *session,
                        switch_channel_set_variable(channel, "dialed_domain", domain);
 
                        d_dest = switch_channel_expand_variables(channel, dest);
+                       switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG, "User/%s@%s d_dest: %s\n", dialed_user, domain, d_dest);
+                       if (sdata && sdata->outgoing_mutex) {
+                               switch_log_printf(SWITCH_CHANNEL_SESSION_LOG(session), SWITCH_LOG_DEBUG, "User/%s@%s unlocking outgoing mutex\n", dialed_user, domain);
+                               switch_mutex_unlock(sdata->outgoing_mutex);
+                       }
+
 
                } else {
                        switch_event_t *event = NULL;

@andywolk
Copy link
Contributor

This new mutex looks safer (assuming the fix is actually correct).

@wmasilva
Copy link
Contributor Author

wmasilva commented Sep 23, 2025

thanks.
Yes the correction looks ok, the lua script i see the correct dialed_user when replacing the dial-string param:

`

`

From the log i see the look/unlock sequence ok. i couldn't detect any deadlock or race condition to replace the dialed_user variable.
the only issue i see is that if one user introduces a delay all the users have to wait for that user, but It is expected to happen because the originator (shared to all threads) is lock.

I test simultaneous group call and is the same behaviour without my changes, so i guess that is ok.

In my test scenario i have:
incoming call to 133 -> callcenter -> ringall (agent1 - user/2001, agent2 - user/2002, agent3 - user/2003)

when calling 2001, the lua script simulates a delay of 5s.

case 1: hangup before any agent answer: if the agent is ringing the call is canceled, and after the sleep the other agents don't received the zombie call.

EDITED: case 2: hangup after an agent answer: the others agents in ring state are canceled, except the one that is in sleeping after the sleep the agents is called and it ring, it hangup immediately when the agent answer or on timeout.

… and prevent user endpoint race condition when expanding dial-string dialed_user variable in same session
@wmasilva wmasilva force-pushed the mod_callcenter-fix_agent_zombie_call branch from 4a01bfe to eefa209 Compare September 24, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mod_callcenter: zombie call to agent
2 participants