cross-modal

Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.

nlp machine-learning embeddings image-classification cross-modal audio-classification video-tagging

Updated Feb 9, 2024
Jupyter Notebook

roboflow / multimodal-maestro

Star

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

object-detection cross-modal multimodality instance-segmentation lmm gpt-4 visual-prompting prompt-engineering vision-language-model llava segment-anything gpt-4-vision

Updated Feb 13, 2024
Python

krantiparida / awesome-audio-visual

Star

A curated list of different papers and datasets in various areas of audio-visual processing

awesome localization awesome-list cross-modal source-separation audio-visual mutli-modal

Updated Jan 30, 2024

caoyue10 / aaai17-cdq

Star

The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"

deep-learning cross-modal quantization similarity-search

Updated Mar 15, 2017
Python

yisun98 / SOLC

Star

Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类

pytorch remote-sensing segmentation cross-modal multi-modal multi-source deeplabv3 land-use-classification oa-kappa sar-optical

Updated May 6, 2024
Python

Zengyi-Qin / Weakly-Supervised-3D-Object-Detection

Star

Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020

tensorflow point-cloud lidar stereo transfer-learning cross-modal unsupervised-learning object-proposals kitti monocular 3d-object-detection weakly-supervised-detection ws3d vs3d acm-mm-2020 unsupervised-object-detection

Updated Mar 24, 2023
Jupyter Notebook

rohitrango / objects-that-sound

Star

Unofficial Implementation of Google Deepmind's paper `Objects that Sound`

machine-learning deep-neural-networks deep-learning embeddings deeplearning deepmind cross-modal audio-video audioset

Updated May 7, 2018
Python

JizhiziLi / RIM

Star

[CVPR 2023] Referring Image Matting

image-segmentation cross-modal matting multimodal image-matting

Updated Apr 17, 2023

kywen1119 / DSRAN

Star

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

computer-vision pytorch cross-modal tcsvt image-text-matching

Updated Oct 25, 2022
Python

DRSY / MoTIS

Star

[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

naacl ai retrieval lsh ios-swift image-search k-means cross-modal clip knn semantic-search knowledge-distillation k-means-clustering random-projection vector-search

Updated May 11, 2023
Swift

GT-RIPL / Xmodal-Ctx

Star

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

image-captioning cross-modal clip vision-and-language

Updated Oct 21, 2022
Python

PetarV- / X-CNN

Star

Cross-modal convolutional neural networks

python keras convolutional-neural-networks cross-modal

Updated Aug 29, 2017
Python

zjukg / DUET

Star

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

semantic pytorch transformer cross-modal zero-shot-learning knowledge-transfer grounding visual-grounding pretrained-language-model

Updated Feb 9, 2024
Python

Eaphan / UPIDet

Star

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]

cross-modal multi-modal 3d-object-detection

Updated Jun 4, 2024
Python

marslanm / Multimodality-Representation-Learning

Star

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

cross-modal multimodal-deep-learning multimodal-datasets transformer-models multimodal-pre-trained-model vision-language-pretraining multimodal-applications multimodal-pretext

Updated Oct 19, 2023

smallflyingpig / speech-to-image-translation-without-text

Star

Code for paper "direct speech-to-image translation"

gan cross-modal speech-to-image

Updated Jun 8, 2020
Python

Improve this page

Add a description, image, and links to the cross-modal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cross-modal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross-modal

Here are 49 public repositories matching this topic...

shaoxiongji / knowledge-graphs

jina-ai / discoart

docarray / docarray

kuanghuei / SCAN

towhee-io / examples

roboflow / multimodal-maestro

krantiparida / awesome-audio-visual

caoyue10 / aaai17-cdq

yisun98 / SOLC

Zengyi-Qin / Weakly-Supervised-3D-Object-Detection

rohitrango / objects-that-sound

JizhiziLi / RIM

kywen1119 / DSRAN

DRSY / MoTIS

GT-RIPL / Xmodal-Ctx

PetarV- / X-CNN

zjukg / DUET

Eaphan / UPIDet

marslanm / Multimodality-Representation-Learning

smallflyingpig / speech-to-image-translation-without-text

Improve this page

Add this topic to your repo