Skip to content

SunnerLi/Cross-you-in-style

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts

Python 3.5 Pytorch 0.4.1 torchvision 0.2.1 librosa 0.7.1

[Project page] [Arxiv paper] [Dataset]

The official PyTorch implementation of our ACM Multimedia 2020 paper. With our proposed framework, we can stylized the given image with another condition music piece.

Abstract

Music-to-visual style transfer is a challenging yet important cross-modal learning problem in the practice of creativity. Its major difference from the traditional image style transfer problem is that the style information is provided by music rather than images. Assuming that musical features can be properly mapped to visual contents through semantic links between the two domains, we solve the music-to-visual style transfer problem in two steps: music visualization and style transfer. The music visualization network utilizes an encoder-generator architecture with a conditional generative adversarial network to generate image-based music representations from music data. This network is integrated with an image style transfer method to accomplish the style transfer process. Experiments are conducted on WikiArt-IMSLP, a newly compiled dataset including Western music recordings and paintings listed by decades. By utilizing such a label to learn the semantic connection between paintings and music, we demonstrate that the proposed framework can generate diverse image style representations from a music piece, and these representations can unveil certain art forms of the same era. Subjective testing results also emphasize the role of the era label in improving the perceptual quality on the compatibility between music and visual content.

Paper

Please cite our paper if you think our research or dataset for your research. * indicates equal contributions

Cheng-Che Lee*, Wan-Yi Lin*, Yen-Ting Shih, Pei-Yi Patricia Kuo, and Li Su, "Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts", in ACM International Conference on Multimedia, 2020.

@inproceedings{lee2020crossing,
  title={Crossing You in Style: Cross-modal Style Transfer from Music to Visual Arts},
  author={Lee, Cheng-Che and Lin, Wan-Yi and Shih, Yen-Ting and Kuo, Pei-Yi and Su, Li},
  booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
  pages={3219--3227},
  year={2020}
}

Method

Prerequisite

  • torch 0.4.1
  • torchvision 0.2.1
  • librosa 0.7.1
  • python 3.5.2
  • cupy (for linear style transfer)
  • pynvrtc (for linear style transfer)

Model Evaluation

Generate Music Style Representation

  1. Download the pretrained model; place the model in ./Source.
  2. Place the target .wav file to ./Source;
  3. Generate ./Source/clips.json, which contains:
    [
        {
            "third": <Start reading at this time>
            "name": <The name of the audio>
            "seg_idx": <The unique index of this segment. The music style representation of this segment will be <seg_idx>.jpg>
            "path": <The path to the audio>
        },
        {...},
        ...c
    ]
    
  4. bash evaluate.sh <base> <count>
  • Parameters:
    • base: Integer. Music style representations will be inferenced for <count> times, and results will be saved to Results/<wav name>/Style_sample<base+count>
    • count: Integer. Music representations will be inferenced for <count> times, and results will be saved to Results/<wav name>/Style_sample<base+count>
  • Output:
    • Results/<wav name>/Style_sample<base+count>
Example
> Folder structure:
    Source/Spring.wav
    Source/last2.pth
    Source/clips.json
        // The content of Source/clips.json
        [
            {
                "third": 2.14,
                "name": "Spring",
                "seg_idx": 1,
                "path": ./Source/spring.wav
            },
            {
                "third": 5.72,
                "name": "Spring",
                "seg_idx": 2,
                "path": ./Source/spring.wav
            },
            ...
        ]
> bash evaluate.sh 0 2
> Output 
    Results/Spring/Style_sample00
    Results/Spring/Style_sample01

Style Transfer

Super Resolution

  1. We use ESRGAN to raise the resolution of the music style representation. Clone the repository and follow the instruction to download the pretrained model.
  2. Download the modified test.py and replace the original one.

Linear Style Transfer

  1. Clone the repository and follow the instruction to download the pretrained model and compile the pytorch_spn repository.
  2. Download the modified TestPhotoReal.py and replace the original one.
  3. Download the modified LoaderPhotoReal.py and replace the original one located in libs

Evaluate

python batch_paint.py --content_image <path1> --style_images <path2>

  • Parameters:
    • --content_image: The path of the content image.
    • --style_images: The path to the folder where the music style representations stay.
  • Output:
    • <image name>/Content : The content image.
    • <image name>/LR : Music representations in low resolution.
    • <image name>/HR : Music representations in high resolution.
    • <image name>/Result : The result of phto-realistic style transfer.
    • <image name>/filtered : Copies of <image name>/Result/*_filtered.jpg.
    • <image name>/smooth : Copies of <image name>/Result/*_smooth.jpg.
    • <image name>/transfer : Copies of <image name>/Result/*_transfer.jpg.
Example
> Folder structure:
    ./Source/
    ./ESRGAN/
    ./LinearStyleTransfer/
    ./Results/
    ./content.jpg
> python batch_paint.py --content_image content.jpg --style_images Results/Spring/Style_sample00
> Output 
    content/Content/
    content/LR/
    content/HR/
    content/LR/
    content/Result/
    content/filtered/
    content/smooth/
    content/transfer/