hackquest logo

ImagiTune

Image-to-Music Synthesis

비디오

설명

ImagiTune, an AI-powered system that converts images into music, where colors, shapes, textures, and patterns influence melody, rhythm, and harmony.

🔍 Key Features

Image Feature Extraction: Combines low-level features (brightness, edges, color histograms) with high-level features extracted using the pre-trained VGG16 model.

MIDI Processing: Converts MAESTRO MIDI files into structured sequences of notes (pitch, velocity, duration, instrument) for model training.

LSTM-Based RNN Model: Learns to generate expressive musical note sequences conditioned on image features.

Music Generation: Uses a seed and the extracted image features to generate a unique MIDI sequence.

Audio Conversion: Converts MIDI to WAV using FluidSynth and enhances audio with normalization and reverb using PyDub.

Output Formats: Saves output as .mid, .wav, and .mp3 files for flexibility and sharing.


📦 Requirements

Python 3.7+

Libraries: numpy, cv2, torch, torchvision, tensorflow, pretty_midi, pydub, matplotlib, scikit-learn, Pillow

FluidSynth with a valid .sf2 SoundFont file

🚀 How It Works

Extract Image Features: extract_image_features() captures a holistic representation of the input image.

Prepare Training Data: prepare_training_data() converts a subset of MAESTRO MIDI files into RNN-ready sequences.

Train the RNN Model: train_rnn() trains a multi-output LSTM model to predict the next note attributes.

Generate Music: generate_music() synthesizes new notes based on the image and a random seed.

Export Audio: The generated notes are saved as a MIDI, converted to WAV using FluidSynth, and finalized to MP3 with audio effects.

🎧 Example Output

After execution, you'll get:

A MIDI representation (generated_music.mid)

A WAV audio version (generated_music.wav)

An MP3 file with normalization and reverb effects (generated_music.mp3)

기술 스택

Python
부문
AI