Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: gdino and gsam as agents #501

Open
wants to merge 7 commits into
base: development
Choose a base branch
from
Open

Conversation

rachwalk
Copy link
Contributor

@rachwalk rachwalk commented Apr 3, 2025

Purpose

Change the GDino and GSam to run using Agent based architecture

Proposed Changes

Refactor of GSam and GDino to run through an agent

Issues

Testing

Run manipulation demo, with this script instead of launching the gdino and gsam node:

from rai_open_set_vision.agents import GroundingDinoAgent, GroundedSamAgent
from rai.utils import wait_for_shutdown
import rclpy


def main():
    rclpy.init()
    agent1 = GroundingDinoAgent()
    agent2 = GroundedSamAgent()
    agent1.run()
    agent2.run()
    wait_for_shutdown([agent1, agent2])
    rclpy.shutdown()


if __name__ == "__main__":
    main()

@rachwalk rachwalk requested a review from maciejmajek April 3, 2025 13:48
@maciejmajek
Copy link
Member

maciejmajek commented Apr 7, 2025

This test takes about 10 sec to run on my pc with only agent2 commented out.
When running both within one file it takes about 130 seconds.
Could you please run this yourself and confirm the time it takes to run this test?

from rai.communication.ros2 import ROS2ARIConnector, ROS2ARIMessage
from rai.utils import ROS2Context
from rai_interfaces.srv import RAIGroundingDino
import cv2
import cv_bridge

from rai.tools.ros2.utils import ros2_message_to_dict

@ROS2Context()
def test_grounding_dino_agent():
    connector = ROS2ARIConnector()
    bridge = cv_bridge.CvBridge()

    image = bridge.cv2_to_imgmsg(cv2.imread("docs/imgs/o3deSimulation.png"))

    msg = RAIGroundingDino.Request()
    msg.source_img = image
    msg.classes = ["chair"]
    msg.box_threshold = 0.5
    msg.text_threshold = 0.5
    
    ari_msg = ROS2ARIMessage(payload=ros2_message_to_dict(msg))
    response = connector.service_call(target="/grounding_dino_classify", message=ari_msg, msg_type="rai_interfaces/srv/RAIGroundingDino")
    print(response.payload)

Similarly, with only one agent running it takes about 10 seconds to run the test

@ROS2Context()
def test_grounded_sam_agent():
    connector = ROS2ARIConnector()
    bridge = cv_bridge.CvBridge()

    image = bridge.cv2_to_imgmsg(cv2.imread("docs/imgs/o3deSimulation.png"), encoding="rgb8")

    msg = RAIGroundedSam.Request()
    msg.source_img = image
    msg.detections = {}

    ari_msg = ROS2ARIMessage(payload=ros2_message_to_dict(msg))
    response = connector.service_call(target="/grounded_sam_segment", message=ari_msg, msg_type="rai_interfaces/srv/RAIGroundedSam", timeout_sec=10)
    print(response.payload)

With two agents it takes about more than 200 seconds.

Edit: When i sent sigint to the pytest, Grounded sam immediately logged

2025-04-07 14:27:44 robo-pc-054 root[1150302] INFO Image embeddings computed.

Not sure if coincidence.

@maciejmajek
Copy link
Member

maciejmajek commented Apr 7, 2025

Also please reintroduce redownloading mechanism #432
560c409

@maciejmajek
Copy link
Member

For some reason it takes a very long time to load on my setup compared to previous implementation

2025-04-07 14:32:18 robo-pc-054 rai_open_set_vision.agents.grounding_dino[1193274] INFO Initializing weight path for groundingdino_swint_ogc.pth
2025-04-07 14:32:18.065 [RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7003: open_and_lock_file failed -> Function open_port_internal
UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3587.)
final text_encoder_type: bert-base-uncased
2025-04-07 14:32:27 robo-pc-054 rai_open_set_vision.agents.grounded_sam[1193274] INFO Initializing weight path for sam2_hiera_large.pt
UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
UserWarning: Support for .yml files is deprecated. Use .yaml extension for Hydra config files
2025-04-07 14:33:39 robo-pc-054 root[1193274] INFO Loaded checkpoint sucessfully
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounding_dino[1193274] INFO Starting Grounding DINO agent
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounding_dino[1193274] INFO Creating service grounding_dino_classify
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounding_dino[1193274] INFO Grounding DINO service ready
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounding_dino[1193274] INFO Grounding DINO agent started
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounded_sam[1193274] INFO Starting Grounded SAM agent
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounded_sam[1193274] INFO Creating service grounded_sam_segment
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounded_sam[1193274] INFO Grounded SAM service ready
2025-04-07 14:33:40 robo-pc-054 rai_open_set_vision.agents.grounded_sam[1193274] INFO Grounded SAM agent started

@maciejmajek
Copy link
Member

Well, turns out using MultiThreadedExecutor in ROS2ARIConnector slow things down 100 times. Both weight initialization as well as inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants