Loading Events

« All Events

Paper Session 7a: Signal Processing I

May 14 @ 9:00 am - 10:30 am
Session Chair: Guilherme Coehlo

 

Paper abstracts

Neal Anderson and Sanjay Majumder: “MBHD: A Modular Audio Playback and Manipulation System for Loop-Based Performance”

This paper outlines the development of MBHD (Modular Beat Handling Device), a real-time audio performance system using Cycling ’74 Max that connects the reliability of DJing with the expressiveness of live electronic composition. While DAWs provide reliable synchronization, achieving a harmonically coherent alignment of loops from different library collections typically requires significant manual editing of metadata. MBHD addresses this challenge with a new, lightweight naming convention based on filename-encoded musical attributes (tempo, root note, and instrument role), providing automatic harmonic coherence. This system is organized around four independent layers of musical content that can be reused and rerouted (drums, bass, harmony, and melody); and utilizes real-time digital signal processing (DSP) to dynamically adjust the pitch and timing of each layer so as to align to a global key and tempo. In addition to the description of the system’s architecture, we also describe the integration of the system with external environments (via Ableton Link) and the design of the user interface to allow for minimal latency during the performance process. Lastly, we report results of evaluations (technical benchmark, user study) of the MBHD, which demonstrate how transparent systems using filename-driven architectures can be used to facilitate the use of loops for improvisation.

The Max patches for this project can be accessed at: phewsh.com/mbhd/max/. Additionally, a browser-based companion application is available at: phewsh.com/mbhd/.

 

Sam Pluta and Ted Moore: “The MMMAudio Computer Music Environment”
We introduce MMMAudio, a new audio creative coding environment designed to close the gap between instrument building and low-level DSP development while reducing the maintenance burden typical of monolithic, compiled systems. Contemporary computer music languages such as Max, Pure Data, and SuperCollider excel at graph-based instrument design but impose steep barriers when custom DSP is required, pushing users into C/C++ plugin workflows with unfamiliar APIs, build systems, and cross-platform complexities. MMMAudio addresses these issues by centering its programming model on Mojo for high-performance DSP and seamless Python–Mojo interoperability for tooling, AI, and scientific libraries. In MMMAudio, unit generators (UGens) are simple Mojo structs, enabling users to write, test, and distribute new UGens without leaving their code editor or contending with external build pipe-lines. This design simultaneously encourages new DSP creation, leverages Python’s mature ecosystem for machine learning and data processing, and exploits Mojo’s performance features (e.g., SIMD) for fast, real-time audio processing. We present the system’s architecture, programming model, and extension mechanisms.
Tian Cheng, Tomoyasu Nakano and Masataka Goto: “Exploring Masked CE Losses to Enhance Word Offset Estimation in CTC-based Lyrics-to-Audio Alignment”
Lyrics-to-audio alignment is an important task for real-world applications such as karaoke systems. Despite alignment performance improved with the release of large datasets and the utility of advanced deep learning models, accurate word offset estimation remains challenging.
To address this problem, we extend our previously proposed masked cross-entropy (CE) loss by proposing new masks to enforce model predictions at masked frames with frame-wise phoneme labels derived from word-level annotations. We train a Convolutional Recurrent Neural Network (CRNN) by using both the masked CE loss and the Connectionist Temporal Classification (CTC) loss. By comparing the results obtained by using different masks in the masked CE loss, we find that word offset estimation performance is improved by using masks which cover all silent frames. In addition, we find that masks on word onset frames are essential for improving word onset estimation performance. We achieve comparable word onset estimation results and provide benchmark word offset estimation results for future research.

 

Details

Organizer

Venue