Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure.iot.device.common.transport_exceptions.ConnectionFailedError #1145

Open
ramumr06 opened this issue May 25, 2023 · 13 comments
Open

azure.iot.device.common.transport_exceptions.ConnectionFailedError #1145

ramumr06 opened this issue May 25, 2023 · 13 comments
Assignees

Comments

@ramumr06
Copy link

ramumr06 commented May 25, 2023

Hi All,
I have built (docker based image) Python based Sample Edge module and deployed on to my ARM based Edge device i.e Gateway. This edge module thrown following errors When i executed "iotedge restart gwmodule" command.

root@tcs-imx6:~# iotedge logs gwmodule
IoT Hub Client for Python
Subscribe for input failed. Not enabling feature
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
host=self._hostname, port=8883, keepalive=self._keep_alive
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
return self.reconnect()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
sock = self._create_socket_connection()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./main.py", line 102, in
main()
File "./main.py", line 78, in main
client = create_client()
File "./main.py", line 55, in create_client
client.on_message_received = receive_message_handler
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 772, in on_message_received
"on_message_received", pipeline_constant.INPUT_MSG, value
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 504, in _generic_receive_handler_setter
self._enable_feature(feature_name)
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 113, in _enable_feature
callback.wait_for_completion()
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/evented_callback.py", line 70, in wait_for_completion
raise self.exception
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
self.transport.connect(password=password)
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-2, 'Name or service not known')
#########################
Few of my workspace details.. I mentioned PythonModule in Last line of the deployment.json . Do i need to change it to "SampleModule" .
tcs@tcs-Latitude-5590:/new-workspace/azure-build/EdgeSolution/modules/SampleModule$ ls
Dockerfile.amd64 Dockerfile.arm32v7 Dockerfile.arm32v7-OLD Dockerfile.arm64v8.debug module.json requirements.txt
Dockerfile.amd64.debug Dockerfile.arm32v7.debug Dockerfile.arm64v8 main.py other-main.py
tcs@tcs-Latitude-5590:
/new-workspace/azure-build/EdgeSolution/modules/SampleModule$ cat ../../config/deployment.arm32v7.json
{
"modulesContent": {
"$edgeAgent": {
"properties.desired": {
"schemaVersion": "1.4",
"runtime": {
"type": "docker",
"settings": {
"minDockerVersion": "v1.25",
"loggingOptions": "",
"registryCredentials": {
"ipsscr": {
"username": "ipsscr",
"password": "******",
"address": "#######"
}
}
}
},
"systemModules": {
"edgeAgent": {
"type": "docker",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-agent:1.4",
"createOptions": "{}"
}
},
"edgeHub": {
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-hub:1.4",
"createOptions": "{"HostConfig":{"PortBindings":{"5671/tcp":[{"HostPort":"5671"}],"8883/tcp":[{"HostPort":"8883"}],"443/tcp":[{"HostPort":"443"}]}}}"
}
}
},
"modules": {
"SampleModule": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "ipsscr.azurecr.io/gwmodule:0.0.1-arm32v7",
"createOptions": "{}"
}
},
"SimulatedTemperatureSensor": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-simulated-temperature-sensor:1.4",
"createOptions": "{}"
}
}
}
}
},
"$edgeHub": {
"properties.desired": {
"schemaVersion": "1.4",
"routes": {
"SampleModuleToIoTHub": "FROM /messages/modules/SampleModule/outputs/
INTO $upstream",
"sensorToSampleModule": "FROM /messages/modules/SimulatedTemperatureSensor/outputs/temperatureOutput INTO BrokeredEndpoint("/modules/SampleModule/inputs/input1")"
},
"storeAndForwardConfiguration": {
"timeToLiveSecs": 7200
}
}
},
"PythonModule": {
"properties.desired": {
"TemperatureThreshold": 25
}
}
}
}

@cartertinney
Copy link
Member

cartertinney commented May 25, 2023

@ramumr06

Try adding an asyncio.sleep() for 5 seconds at the beginning of your main() block - I don't know which samples you were looking at, but at one point Edge changed behavior so that some modules in Python have issues running immediately when a docker container starts and broke some of the samples as a result.

If that doesn't help, you'll need to provide more information for us to investigate further - your code, as well as the version of the azure-iot-device SDK you are using.

@cartertinney cartertinney self-assigned this May 25, 2023
@ramumr06
Copy link
Author

Thanks you cartertinney. I will try with adding async.sleep(5) and let you know the out come. Mean while here is my code and deployment files and versions.
Here are some info about work space and execution environment.
Machine: arm32v7
OS: Yocto Linux
As of now I am not running edge module called "SimulatedTemperatureSensor" in my gateway.

Version Info

azure-iot-device~=2.7.0
root@tcs-imx6:# iotedge version
iotedge 1.1.15
root@tcs-imx6:
# iotedge list
NAME STATUS DESCRIPTION CONFIG
edgeAgent running Up 2 days mcr.microsoft.com/azureiotedge-agent:1.4
edgeHub running Up 2 days mcr.microsoft.com/azureiotedge-hub:1.4
##############
main.py file

Copyright (c) Microsoft. All rights reserved.

Licensed under the MIT license. See LICENSE file in the project root for

full license information.

import asyncio
import sys
import signal
import threading
from azure.iot.device import IoTHubModuleClient
import json

global counters

TEMPERATURE_THRESHOLD = 25
TWIN_CALLBACKS = 0
RECEIVED_MESSAGES = 0

Event indicating client stop

stop_event = threading.Event()

def create_client():
client = IoTHubModuleClient.create_from_edge_environment()

# Define function for handling received messages
async def receive_message_handler(message):
    global RECEIVED_MESSAGES
    print("Message received")
    size = len(message.data)
    message_text = message.data.decode('utf-8')
    print("    Data: <<<{data}>>> & Size={size}".format(data=message.data, size=size))
    print("    Properties: {}".format(message.custom_properties))
    RECEIVED_MESSAGES += 1
    print("Total messages received: {}".format(RECEIVED_MESSAGES))

    if message.input_name == "input1":
        message_json = json.loads(message_text)
        if "machine" in message_json and "temperature" in message_json["machine"] and message_json["machine"]["temperature"] > TEMPERATURE_THRESHOLD:
            message.custom_properties["MessageType"] = "Alert"
            print("ALERT: Machine temperature {temp} exceeds threshold {threshold}".format(
                temp=message_json["machine"]["temperature"], threshold=TEMPERATURE_THRESHOLD
            ))
            await client.send_message_to_output(message, "output1")

# Define function for handling received twin patches
async def receive_twin_patch_handler(twin_patch):
    global TEMPERATURE_THRESHOLD
    global TWIN_CALLBACKS
    print("Twin Patch received")
    print("     {}".format(twin_patch))
    if "TemperatureThreshold" in twin_patch:
        TEMPERATURE_THRESHOLD = twin_patch["TemperatureThreshold"]
    TWIN_CALLBACKS += 1
    print("Total calls confirmed: {}".format(TWIN_CALLBACKS))

try:
    # Set handler on the client
    client.on_message_received = receive_message_handler
    client.on_twin_desired_properties_patch_received = receive_twin_patch_handler
except:
    # Cleanup if failure occurs
    client.shutdown()
    raise

return client

async def run_sample(client):
# Customize this coroutine to do whatever tasks the module initiates
# e.g. sending messages
while True:
await asyncio.sleep(1000)

def main():
asyncio.sleep(5)
if not sys.version >= "3.5.3":
raise Exception( "The sample requires python 3.5.3+. Current version of Python: %s" % sys.version )
print ( "IoT Hub Client for Python" )

# NOTE: Client is implicitly connected due to the handler being set on it
client = create_client()

# Define a handler to cleanup when module is is terminated by Edge
def module_termination_handler(signal, frame):
    print ("IoTHubClient sample stopped by Edge")
    stop_event.set()

# Set the Edge termination handler
signal.signal(signal.SIGTERM, module_termination_handler)

# Run the sample
loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(run_sample(client))
except Exception as e:
    print("Unexpected error %s " % e)
    raise
finally:
    print("Shutting down IoT Hub Client...")
    loop.run_until_complete(client.shutdown())
    loop.close()

if name == "main":
main()

deployment.arm32v7.json
{
"modulesContent": {
"$edgeAgent": {
"properties.desired": {
"schemaVersion": "1.4",
"runtime": {
"type": "docker",
"settings": {
"minDockerVersion": "v1.25",
"loggingOptions": "",
"registryCredentials": {
"ipsscr": {
"username": "ipsscr",
"password": "pVpFTFmu5A+Si5FWnqPGeN/VNCHC51UK",
"address": "ipsscr.azurecr.io"
}
}
}
},
"systemModules": {
"edgeAgent": {
"type": "docker",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-agent:1.4",
"createOptions": "{}"
}
},
"edgeHub": {
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-hub:1.4",
"createOptions": "{"HostConfig":{"PortBindings":{"5671/tcp":[{"HostPort":"5671"}],"8883/tcp":[{"HostPort":"8883"}],"443/tcp":[{"HostPort":"443"}]}}}"
}
}
},
"modules": {
"SampleModule": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "ipsscr.azurecr.io/gwmodule:0.0.1-arm32v7",
"createOptions": "{}"
}
},
"SimulatedTemperatureSensor": {
"version": "1.0",
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-simulated-temperature-sensor:1.4",
"createOptions": "{}"
}
}
}
}
},
"$edgeHub": {
"properties.desired": {
"schemaVersion": "1.4",
"routes": {
"SampleModuleToIoTHub": "FROM /messages/modules/SampleModule/outputs/* INTO $upstream",
"sensorToSampleModule": "FROM /messages/modules/SimulatedTemperatureSensor/outputs/temperatureOutput INTO BrokeredEndpoint("/modules/SampleModule/inputs/input1")"
},
"storeAndForwardConfiguration": {
"timeToLiveSecs": 7200
}
}
},
"SampleModule": {
"properties.desired": {
"TemperatureThreshold": 25
}
}
}
}

@ramumr06
Copy link
Author

Full Failure Log follows...

root@tcs-imx6:~# iotedge logs gwmodule
IoT Hub Client for Python
Subscribe for input failed. Not enabling feature
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
host=self._hostname, port=8883, keepalive=self._keep_alive
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
return self.reconnect()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
sock = self._create_socket_connection()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./main.py", line 102, in
main()
File "./main.py", line 78, in main
client = create_client()
File "./main.py", line 55, in create_client
client.on_message_received = receive_message_handler
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 772, in on_message_received
"on_message_received", pipeline_constant.INPUT_MSG, value
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 504, in _generic_receive_handler_setter
self._enable_feature(feature_name)
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/sync_clients.py", line 113, in _enable_feature
callback.wait_for_completion()
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/evented_callback.py", line 70, in wait_for_completion
raise self.exception
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
self.transport.connect(password=password)
File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-2, 'Name or service not known')

@ramumr06
Copy link
Author

Hi Cartertinney,
I have assigned my device name to hostname:" " in /etc/iotedge/config.yaml file. Then one of my edge module able to connect and send the telemetry data to IoTHub. This is "SimulatedTemperatureSensor" from Microsoft market registry. I have built one more edge module using VS code (in Python) and deployed on to Edge device from my private Registry. But I see below errors from edgeHub. Details about my edge module pasted above. These are the logs..

<6> 2023-05-30 09:49:20.780 +00:00 [INF] - Client tcsgateway1/gwmodule in device scope authenticated locally.
<6> 2023-05-30 09:49:20.815 +00:00 [INF] - Successfully generated identity for clientId tcsgateway1/gwmodule and username Kochi-IOT.azure-devices.net/tcsgateway1/gwmodule/?api-version=2019-10-01&DeviceClientType=azure-iot-device-iothub-py%2F2.7.1%283.7.16%3BLinux%20%231%20SMP%20Mon%20Jun%201%2017%3A39%3A15%20UTC%202020%3Barmv7l%29
<6> 2023-05-30 09:49:20.860 +00:00 [INF] - "ClientAuthenticated, tcsgateway1/gwmodule, 16aee1aa"
<6> 2023-05-30 09:49:21.100 +00:00 [INF] - New device connection for device tcsgateway1/gwmodule
<6> 2023-05-30 09:49:21.103 +00:00 [INF] - Client tcsgateway1/gwmodule connected to edgeHub, processing existing subscriptions.
<6> 2023-05-30 09:49:21.157 +00:00 [INF] - Bind device proxy for device tcsgateway1/gwmodule
<6> 2023-05-30 09:49:21.164 +00:00 [INF] - Binding message channel for device Id tcsgateway1/gwmodule
<6> 2023-05-30 09:49:21.359 +00:00 [INF] - Processing subscriptions ModuleMessages, DesiredPropertyUpdates for client tcsgateway1/gwmodule.
<6> 2023-05-30 09:49:21.362 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:21.395 +00:00 [INF] - Set subscriptions from session state for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:21.429 +00:00 [INF] - Attempting to connect to IoT Hub for client tcsgateway1/gwmodule via AMQP...
<6> 2023-05-30 09:49:21.471 +00:00 [INF] - Processing subscriptions ModuleMessages, DesiredPropertyUpdates for client tcsgateway1/gwmodule.
<6> 2023-05-30 09:49:21.492 +00:00 [INF] - Set subscriptions from session state for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:22.074 +00:00 [INF] - Cloud connection for tcsgateway1/gwmodule is True
<6> 2023-05-30 09:49:22.075 +00:00 [INF] - Connection status for tcsgateway1/gwmodule changed to ConnectionEstablished
<6> 2023-05-30 09:49:22.075 +00:00 [INF] - Client tcsgateway1/gwmodule connected to cloud, processing existing subscriptions.
<6> 2023-05-30 09:49:22.075 +00:00 [INF] - Skipping tcsgateway1/gwmodule for subscription processing, as it is currently being processed.
<6> 2023-05-30 09:49:22.076 +00:00 [INF] - Created cloud proxy for client tcsgateway1/gwmodule via AMQP, with client operation timeout 20 seconds.
<6> 2023-05-30 09:49:22.082 +00:00 [INF] - Initialized cloud proxy a2a5d2d3-b65a-4be4-bc8c-0bda355dc03c for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:22.083 +00:00 [INF] - Created cloud connection for client tcsgateway1/gwmodule
<6> 2023-05-30 09:49:22.226 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:23.028 +00:00 [INF] - Updated reported properties for tcsgateway1/$edgeHub
<4> 2023-05-30 09:49:27.932 +00:00 [WRN] - "Closing connection for device: tcsgateway1/gwmodule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed., "
<6> 2023-05-30 09:49:27.939 +00:00 [INF] - Disposing MessagingServiceClient for device Id tcsgateway1/gwmodule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed.
<6> 2023-05-30 09:49:27.962 +00:00 [INF] - Setting device proxy inactive for device Id
tcsgateway1/gwmodule
<6> 2023-05-30 09:49:27.982 +00:00 [INF] - Removing device connection for device tcsgateway1/gwmodule with removeCloudConnection flag 'True'.
<6> 2023-05-30 09:49:28.496 +00:00 [INF] - Closing receiver in cloud proxy a2a5d2d3-b65a-4be4-bc8c-0bda355dc03c for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:28.509 +00:00 [INF] - Closed cloud proxy a2a5d2d3-b65a-4be4-bc8c-0bda355dc03c for tcsgateway1/gwmodule
<6> 2023-05-30 09:49:28.515 +00:00 [INF] - Device connection removed for device tcsgateway1/gwmodule
<6> 2023-05-30 09:49:28.554 +00:00 [INF] - Remove device connection for device tcsgateway1/gwmodule
<6> 2023-05-30 09:49:30.775 +00:00 [INF] - Updated reported properties for tcsgateway1/$edgeHub

@cartertinney
Copy link
Member

cartertinney commented May 30, 2023

@ramumr06

You need to await the asyncio.sleep() invocation, as it is a coroutine. Unless you await it, it will not actually help.

Additionally, in the future, please format your code and logs by using adding ``` before and after. That's three ` characters.

@ramumr06
Copy link
Author

ramumr06 commented May 31, 2023

Edge Module getting disconnected with " <4> 2023-05-31 10:03:38.355 +00:00 [WRN] - "Closing connection for device: tcsgateway1/gwmodule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed., "
<6> 2023-05-31 10:03:38.359 +00:00 [INF] - Disposing MessagingServiceClient for device Id tcsgateway1/gwmodule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed."

Logs from edgeHub follows..


<6> 2023-05-31 11:32:35.769 +00:00 [INF] - Client tcsgateway1/gwmodule in device scope authenticated locally.
<6> 2023-05-31 11:32:35.775 +00:00 [INF] - Successfully generated identity for clientId tcsgateway1/gwmodule and username Kochi-IOT.azure-devices.net/tcsgateway1/gwmodule/?api-version=2019-10-01&DeviceClientType=azure-iot-device-iothub-py%2F2.7.1%283.7.16%3BLinux%20%231%20SMP%20Mon%20Jun%201%2017%3A39%3A15%20UTC%202020%3Barmv7l%29
<6> 2023-05-31 11:32:35.776 +00:00 [INF] - "ClientAuthenticated, tcsgateway1/gwmodule, 1508b504"
<6> 2023-05-31 11:32:35.781 +00:00 [INF] - New device connection for device tcsgateway1/gwmodule
<6> 2023-05-31 11:32:35.781 +00:00 [INF] - Client tcsgateway1/gwmodule connected to edgeHub, processing existing subscriptions.
<6> 2023-05-31 11:32:35.797 +00:00 [INF] - Bind device proxy for device tcsgateway1/gwmodule
<6> 2023-05-31 11:32:35.797 +00:00 [INF] - Binding message channel for device Id tcsgateway1/gwmodule
<6> 2023-05-31 11:32:35.801 +00:00 [INF] - Attempting to connect to IoT Hub for client tcsgateway1/gwmodule via AMQP...
<6> 2023-05-31 11:32:35.801 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule
<6> 2023-05-31 11:32:35.820 +00:00 [INF] - Processing subscriptions ModuleMessages, DesiredPropertyUpdates for client tcsgateway1/gwmodule.
<6> 2023-05-31 11:32:35.827 +00:00 [INF] - Set subscriptions from session state for tcsgateway1/gwmodule
<6> 2023-05-31 11:32:35.863 +00:00 [INF] - Processing subscriptions ModuleMessages, DesiredPropertyUpdates for client tcsgateway1/gwmodule.
<6> 2023-05-31 11:32:35.869 +00:00 [INF] - Set subscriptions from session state for tcsgateway1/gwmodule
<4> 2023-05-31 11:32:36.500 +00:00 [WRN] - "Closing connection for device: tcsgateway1/gwmodule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed., "
<6> 2023-05-31 11:32:36.501 +00:00 [INF] - Disposing MessagingServiceClient for device Id tcsgateway1/gwmodule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed.
<6> 2023-05-31 11:32:36.501 +00:00 [INF] - Setting device proxy inactive for device Id tcsgateway1/gwmodule
<6> 2023-05-31 11:32:36.501 +00:00 [INF] - Removing device connection for device tcsgateway1/gwmodule with removeCloudConnection flag 'True'.
<6> 2023-05-31 11:32:36.501 +00:00 [INF] - Device connection removed for device tcsgateway1/gwmodule
<6> 2023-05-31 11:32:36.503 +00:00 [INF] - Remove device connection for device tcsgateway1/gwmodule
<6> 2023-05-31 11:32:37.101 +00:00 [INF] - Cloud connection for tcsgateway1/gwmodule is True
<6> 2023-05-31 11:32:37.101 +00:00 [INF] - Connection status for tcsgateway1/gwmodule changed to ConnectionEstablished
<6> 2023-05-31 11:32:37.101 +00:00 [INF] - Client tcsgateway1/gwmodule connected to cloud, processing existing subscriptions.
<6> 2023-05-31 11:32:37.102 +00:00 [INF] - Skipping tcsgateway1/gwmodule for subscription processing, as it is currently being processed.
<6> 2023-05-31 11:32:37.102 +00:00 [INF] - Created cloud proxy for client tcsgateway1/gwmodule via AMQP, with client operation timeout 20 seconds.
<6> 2023-05-31 11:32:37.102 +00:00 [INF] - Initialized cloud proxy 294a5a44-815f-4f58-934f-753661324c2f for tcsgateway1/gwmodule
<6> 2023-05-31 11:32:37.103 +00:00 [INF] - Created cloud connection for client tcsgateway1/gwmodule
<6> 2023-05-31 11:32:37.685 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule
<6> 2023-05-31 11:32:37.685 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule
<6> 2023-05-31 11:32:37.685 +00:00 [INF] - Processing pending subscriptions for tcsgateway1/gwmodule

Logs from Edge Module "gwmodule" follows...

IoT Hub Client for Python
Set handler on the client
Client create done
Unexpected error This event loop is already running 
Shutting down IoT Hub Client...
Traceback (most recent call last):
  File "./main.py", line 101, in main
    loop.run_until_complete(run_sample(client))
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 563, in run_until_complete
    self._check_runnung()
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 523, in _check_runnung
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./main.py", line 115, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "./main.py", line 108, in main
    loop.run_until_complete(client.shutdown())
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 563, in run_until_complete
    self._check_runnung()
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 523, in _check_runnung
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running
sys:1: RuntimeWarning: coroutine 'run_sample' was never awaited
sys:1: RuntimeWarning: coroutine 'execute_patch_for_async.<locals>.shutdown' was never awaited
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xb5bd45f0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xb5bd4670>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xb5bd46d0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xb5bd4cd0>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
(sdkprg) root@tcs-imx6:/var/lib/docker/overlay2/0822e7fd6e247a33ef62b80f78e25911afadc56f4aed473dd80fbd9ef8e238f6/diff/app# 
(sdkprg) root@tcs-imx6:/var/lib/docker/overlay2/0822e7fd6e247a33ef62b80f78e25911afadc56f4aed473dd80fbd9ef8e238f6/diff/app# 

My Python Code for Edge Module follows..

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE file in the project root for
# full license information.

import asyncio
import sys
import signal
import threading
from azure.iot.device.aio import IoTHubModuleClient
import json
import time

# global counters
TEMPERATURE_THRESHOLD = 25
TWIN_CALLBACKS = 0
RECEIVED_MESSAGES = 0

# Event indicating client stop
stop_event = threading.Event()

def create_client():
    client = IoTHubModuleClient.create_from_edge_environment()

    # Define function for handling received messages
    async def receive_message_handler(message):
        global RECEIVED_MESSAGES
        print("Message received")
        size = len(message.data)
        message_text = message.data.decode('utf-8')
        print("    Data: <<<{data}>>> & Size={size}".format(data=message.data, size=size))
        print("    Properties: {}".format(message.custom_properties))
        RECEIVED_MESSAGES += 1
        print("Total messages received: {}".format(RECEIVED_MESSAGES))

        if message.input_name == "input1":
            message_json = json.loads(message_text)
            if "machine" in message_json and "temperature" in message_json["machine"] and message_json["machine"]["temperature"] > TEMPERATURE_THRESHOLD:
                message.custom_properties["MessageType"] = "Alert"
                print("ALERT: Machine temperature {temp} exceeds threshold {threshold}".format(
                    temp=message_json["machine"]["temperature"], threshold=TEMPERATURE_THRESHOLD
                ))
                await client.send_message_to_output(message, "output1")

    # Define function for handling received twin patches
    async def receive_twin_patch_handler(twin_patch):
        global TEMPERATURE_THRESHOLD
        global TWIN_CALLBACKS
        print("Twin Patch received")
        print("     {}".format(twin_patch))
        if "TemperatureThreshold" in twin_patch:
            TEMPERATURE_THRESHOLD = twin_patch["TemperatureThreshold"]
        TWIN_CALLBACKS += 1
        print("Total calls confirmed: {}".format(TWIN_CALLBACKS))

    try:
        # Set handler on the client
        client.on_message_received = receive_message_handler
        client.on_twin_desired_properties_patch_received = receive_twin_patch_handler
        print("Set handler on the client")
    except:
        # Cleanup if failure occurs
        client.shutdown()
        raise

    return client


async def run_sample(client):
    # Customize this coroutine to do whatever tasks the module initiates
    # e.g. sending messages
    print("In run_sample")
    while True:
        await asyncio.sleep(5)


#def main():
async def main():
    if not sys.version >= "3.5.3":
        raise Exception( "The sample requires python 3.5.3+. Current version of Python: %s" % sys.version )
    print ( "IoT Hub Client for Python" )

    await asyncio.sleep(5)
    #time.sleep(5)
    # NOTE: Client is implicitly connected due to the handler being set on it
    client = create_client()
    print("Client create done")

    # Define a handler to cleanup when module is is terminated by Edge
    def module_termination_handler(signal, frame):
        print ("IoTHubClient sample stopped by Edge")
        stop_event.set()

    # Set the Edge termination handler
    signal.signal(signal.SIGTERM, module_termination_handler)

  
    # Run the sample
    
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(run_sample(client))
    except Exception as e:
        print("Unexpected error %s " % e)
        raise
    
    finally:
        print("Shutting down IoT Hub Client...")
        loop.run_until_complete(client.shutdown())
        loop.close()
    

  

if __name__ == "__main__":
    asyncio.run(main())
    #main()

@cartertinney
Copy link
Member

cartertinney commented May 31, 2023

@ramumr06

My apologies - I made a mistake. Seeing the better formatted logs and code make it clear to me. You're now running into an issue with the async event loop because of what I said to you - You should keep main as a synchronous function (def main) and use a time.sleep(5) instead of asyncio.sleep(5). The issue you're seeing right now is because you are calling loop.run_until_complete() inside a coroutine that has already been invoked by asyncio.run(). You should switch back to the commented code I see there, like this:

def main():
    if not sys.version >= "3.5.3":
        raise Exception( "The sample requires python 3.5.3+. Current version of Python: %s" % sys.version )
    print ( "IoT Hub Client for Python" )

    time.sleep(5)

    <....>

if __name__ == "__main__":
    main()

Additionally, when formatting, please use backticks (`) not apostrophes ('). I have edited your post accordingly for correct formatting.

@ramumr06
Copy link
Author

ramumr06 commented Jun 1, 2023

Thank you.. But still the issue continues. The connection between my Edge Module "gwmodule" and the edgeHub is getting dropped with in 3 to 4 sec. Saying "Protocol Gateway Exception"
As i Said earlier Edge Module SimulatedTemperatureSensor from azure market working fine.
There is route definition in the deployment file between these 3 i.e SimulatedTemperatureSensor, "gwmodule" and edgeHub.

<4> 2023-06-01 05:59:04.566 +00:00 [WRN] - "Closing connection for device: tcsgateway1/gwmodule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed., " <6> 2023-06-01 05:59:04.582 +00:00 [INF] - Disposing MessagingServiceClient for device Id tcsgateway1/gwmodule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed.

@BertKleewein
Copy link
Member

@ramumr06 - I recommend you change the password on your container repository. You accidentally leaked it in the deployment manifest above. If that password gives write access to the repository, I recommend destroying the repository and creating a new one from scratch.

@ramumr06
Copy link
Author

ramumr06 commented Jun 7, 2023

@BertKleewein - I will update the registry password. This is just to secure my registry right?
Any Idea / solution about my issue?
As i Said earlier Edge Module SimulatedTemperatureSensor from azure market working fine. Its able to connect and send the data to IoT hub (Edge runtime) on the gateway.

@BertKleewein
Copy link
Member

Hi @ramumr06 , yes, my comment was only about securing your registry.

Logs from edgehub don't really help. logs form your gwmodule are more useful, but I 'm not sure what you mean when you say "Still the issue continues". Do you mean you're still seeing the gaierror from your may 26 log, even with a 5 second sleep before it connects, or do you mean you're still seeing the event loop is already running error from your may 31 logs?

The gaierror indicates that the code running inside the module isn't able to resolve the hostname (from config.toml) so it can connect to iot edge gateway. If you've hand-edited config.toml to change the hostname to something other than the hostname of your iot edge machine, that would explain this problem. One way to debug this (gaierror) is to open a bash or sh shell inside the gwmodule container and try to ping your hostname from inside the container. If ping isn't able to resolve this name, then the python module won't be able to resolve it either.

The event loop is already running error indicates that you have a bug in your module code and it probably doesn't live long enough to try connecting.

@ramumr06
Copy link
Author

The "Event Loop" Issue resolved. Even the hostname is defined in the config.toml file. My other Edge Module SimulatedTemperatureSensor from azure market working fine.
My module "gwmodule" just displaying print statements only. Below print statement is getting executed as well..
client.on_message_received = receive_message_handler
client.on_twin_desired_properties_patch_received = receive_twin_patch_handler
print("Set handler on the client")

edgeHub log showing following error..

<4> 2023-05-30 09:49:27.932 +00:00 [WRN] - "Closing connection for device: tcsgateway1/gwmodule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed., "
<6> 2023-05-30 09:49:27.939 +00:00 [INF] - Disposing MessagingServiceClient for device Id tcsgateway1/gwmodule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Channel closed.
<6> 2023-05-30 09:49:27.962 +00:00 [INF] - Setting device proxy inactive for device Id tcsgateway1/gwmodule

@jeremielalanne
Copy link

Hello guys,

Having the same issue too, and I tried the time.sleep(5) trick. I have been having that same issue for a long time now, and only on python modules, whenever the module is hard shutdown, without executing the shutdown function, for example when the device is unplugged.

Main function:

##################################
# Main function / Env settler
##################################
def main():
    """ Main function.
    Is used to setup the whole module.
    """
    
    if not sys.version >= "3.5.3":
        raise Exception( "The module requires python 3.5.3+. Current version of Python: %s" % sys.version )
    log("IoT Hub Client for Python" )
    time.sleep(5)

    # NOTE: Client is implicitly connected due to the handler being set on it
    client = create_client()

    # Define a handler to cleanup when module is is terminated by Edge
    def module_termination_handler(signal, frame):
        log("IoTHubClient stopped by Edge")
        stop_event.set()

    # Set the Edge termination handler
    signal.signal(signal.SIGTERM, module_termination_handler)

    # Run
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(run_module(client))
    except Exception as e:
        log("Unexpected error %s " % e)
        raise
    finally:
        log("Shutting down IoT Hub Client...")
        loop.run_until_complete(client.shutdown())
        loop.stop()
        loop.close()


if __name__ == "__main__":
    main()

Module logs:

Before adding the time.sleep(5)

2024-10-15T06:14:46.631787      - IoT Hub Client for Python
Subscribe for input failed.  Not enabling feature
Traceback (most recent call last):
2024-10-15T06:14:47.166569      - Exception occured in create client: {ex}
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python3.7/socket.py", line 752, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 33, in handle_result
    return await callback.completion()
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/async_adapter.py", line 94, in completion
    return await self.future
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by gaierror(-2, 'Name or service not known')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./main.py", line 510, in <module>
    main()
  File "./main.py", line 482, in main
    client = create_client()
  File "./main.py", line 367, in create_client
    client.on_message_received = receive_message_handler
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 740, in on_message_received
    self._generic_receive_handler_setter("on_message_received", constant.INPUT_MSG, value)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 483, in _generic_receive_handler_setter
    fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 106, in _enable_feature
    await handle_result(callback)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 37, in handle_result
    raise exceptions.ConnectionFailedError(message="Could not connect to IoTHub", cause=e)
azure.iot.device.exceptions.ConnectionFailedError: ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xffffa9431d90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xffffa9431f90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

After adding the time.sleep(5)

2024-10-15T06:43:55.443170      - IoT Hub Client for Python
Subscribe for input failed.  Not enabling feature
2024-10-15T06:44:02.129833      - Exception occured in create client: {ex}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 396, in connect
    host=self._hostname, port=8883, keepalive=self._keep_alive
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 914, in connect
    return self.reconnect()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1044, in reconnect
    sock = self._create_socket_connection()
  File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 3685, in _create_socket_connection
    return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
  File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
    raise err
  File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 33, in handle_result
    return await callback.completion()
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/async_adapter.py", line 94, in completion
    return await self.future
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/pipeline/pipeline_stages_mqtt.py", line 193, in _run_op
    self.transport.connect(password=password)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/common/mqtt_transport.py", line 418, in connect
    raise exceptions.ConnectionFailedError(cause=e)
azure.iot.device.common.transport_exceptions.ConnectionFailedError: ConnectionFailedError(None) caused by ConnectionRefusedError(111, 'Connection refused')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./main.py", line 511, in <module>
    main()
  File "./main.py", line 483, in main
    client = create_client()
  File "./main.py", line 367, in create_client
    client.on_message_received = receive_message_handler
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 740, in on_message_received
    self._generic_receive_handler_setter("on_message_received", constant.INPUT_MSG, value)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 483, in _generic_receive_handler_setter
    fut.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 106, in _enable_feature
    await handle_result(callback)
  File "/usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_clients.py", line 37, in handle_result
    raise exceptions.ConnectionFailedError(message="Could not connect to IoTHub", cause=e)
azure.iot.device.exceptions.ConnectionFailedError: ConnectionFailedError('Could not connect to IoTHub') caused by ConnectionFailedError(None)
Task was destroyed but it is pending!
task: <Task pending coro=<AsyncHandlerManager._receiver_handler_runner() running at /usr/local/lib/python3.7/site-packages/azure/iot/device/iothub/aio/async_handler_manager.py:43> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/local/lib/python3.7/asyncio/futures.py:351, <TaskWakeupMethWrapper object at 0xffff83500f50>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>
Task was destroyed but it is pending!
task: <Task pending coro=<_AsyncQueueProxy.get() running at /usr/local/lib/python3.7/site-packages/janus/__init__.py:451> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0xffff83500e90>()]> cb=[_chain_future.<locals>._call_set_state() at /usr/local/lib/python3.7/asyncio/futures.py:358]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants