Loading Events

« All Events

Paper Session 8: Signal Processing II

May 14 @ 11:00 am - 12:30 pm
Three papers will be presented and discussed:

 

Alexandre Francois: “Real-Time, Low-Latency, High Resolution Audio Spectral Analysis: Phase Matters”
This paper introduces an original approach to computing a spectral representation of audio signals, with high temporal and frequency resolution and high amplitude accuracy, in real-time and with low latency. Applying techniques from phase vocoders to make use of phase information, a new tracking resonator model extends the original Resonate model while retaining its iterative formulation and computational efficiency. A bank composed of frequency tracking resonators constantly self-tunes to the contents of the input signal, rendering the precise tuning of the resonators irrelevant, as long as the bank offers an appropriate coverage of the frequency range of interest for the target application. Self-tuning banks form the basis for an analysis technique that produces, in real-time, for each input sample, a list of uniquely identified and precisely tracked frequency components present in the input signal, together with their correct amplitudes. High temporal and frequency resolution spectrograms illustrate the spectral analysis of real musical signals in a familiar format. The detailed representations produced can potentially improve the quality and accuracy of any traditional application. They also offer promising prospects for real-time, low-latency applications such as accompaniment and improvisation systems. Encouraging initial synthesis experiments also motivate further investigation.
Robert Esler: “Pd++: A C++ Library of Pure Data’s DSP Objects”

Pd++ is a real-time C++ audio synthesis library that implements Pure Data’s DSP (digital signal processing) objects as C++ classes, making it usable with object-oriented programming languages like C++, Java, or C#. The library has been designed to follow similar logic and naming conventions of Pure Data. It includes bindings for Java which allows the library to work with the Processing development environment and C# providing a native code interface to the Unity game engine. Pd++ has also been extensively tested on all major operating systems including iOS and Android, single board CPUs like the Raspberry Pi, as well as C++ based Application Programming Inter- faces (APIs) such as Unreal Engine, Wwise, JUCE and FMOD. In this article the author presents how the library works in design, practice and philosophy, its perceived workflow as a design and educational tool, as well as future developments for Pd++.

Tian Cheng, Tomoyasu Nakano and Masataka Goto: “Exploring Masked CE Losses to Enhance Word Offset Estimation in CTC-based Lyrics-to-Audio Alignment”

Lyrics-to-audio alignment is an important task for real-world applications such as karaoke systems. Despite alignment performance improved with the release of large datasets and the utility of advanced deep learning models, accurate word offset estimation remains challenging. To address this problem, we extend our previously proposed masked cross-entropy (CE) loss by proposing new masks to enforce model predictions at masked frames with frame-wise phoneme labels derived from word-level annotations. We train a Convolutional Recurrent Neural Network (CRNN) by using both the masked CE loss and the Connectionist Temporal Classification (CTC) loss. By comparing the results obtained by using different masks in the masked CE loss, we find that word offset estimation performance is improved by using masks which cover all silent frames. In addition, we find that masks on word onset frames are essential for improving word onset estimation performance. We achieve comparable word onset estimation results and provide benchmark word offset estimation results for future research.

 

Details

Organizer

Venue