Physics Maths Engineering
Sudarsana Reddy Kadiri,
Sudarsana Reddy Kadiri
Department of Information and Communications Engineering
Paavo Alku
Peer Reviewed
In this study, formant tracking is investigated by refining the formants tracked by an existing data-driven tracker, DeepFormants, using the formants estimated in a model-driven manner by linear prediction (LP)-based methods. As LP-based formant estimation methods, conventional covariance analysis (LP-COV) and the recently proposed quasi-closed phase forward–backward (QCP-FB) analysis are used. In the proposed refinement approach, the contours of the three lowest formants are first predicted by the data-driven DeepFormants tracker, and the predicted formants are replaced frame-wise with local spectral peaks shown by the model-driven LP-based methods. The refinement procedure can be plugged into the DeepFormants tracker with no need for any new data learning. Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers using the popular vocal tract resonance (VTR) corpus. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis. In addition, by tracking formants using VTR speech that was corrupted by additive noise, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers. In general, these results suggest that LP-based model-driven approaches, which have traditionally been used in formant estimation, can be combined with a modern data-driven tracker easily with no further training to improve the tracker’s performance.
Formant tracking is the process of identifying and tracking the resonant frequencies (formants) of speech, which are crucial for understanding speech production and recognition. It is widely used in speech processing, linguistics, and voice analysis.
DeepFormants is a data-driven formant tracking tool that uses deep learning to predict formant frequencies. It provides accurate formant contours but can be further refined using model-driven methods like linear prediction (LP).
The hybrid approach combines the strengths of data-driven (DeepFormants) and model-driven (LP-based) methods. It refines DeepFormants' predictions by replacing them with local spectral peaks identified by LP methods, improving accuracy without requiring additional training.
LP-COV (Linear Prediction Covariance Analysis) and QCP-FB (Quasi-Closed Phase Forward-Backward Analysis) are model-driven methods for estimating formants. QCP-FB, a recent advancement, provides more accurate formant estimates and is used to refine DeepFormants' predictions.
The refined DeepFormants, especially when using QCP-FB, outperforms traditional formant trackers. It achieves higher accuracy and is more resilient to noise, making it suitable for real-world applications where speech quality may vary.
The VTR (Vocal Tract Resonance) corpus is a popular dataset for evaluating formant tracking algorithms. It provides clean and noisy speech samples, making it ideal for testing the accuracy and noise resilience of the proposed methods.
The hybrid approach, particularly when refined with QCP-FB, shows greater resilience to noise compared to traditional trackers. It maintains accurate formant tracking even in noisy conditions, which is critical for real-world speech processing.
Combining these methods leverages the strengths of both: data-driven methods provide robust predictions, while model-driven methods offer precise local adjustments. This hybrid approach improves accuracy without requiring additional training or data.
Yes, the hybrid approach can be adapted for tasks like speech recognition, speaker identification, and voice analysis. Its ability to handle noisy data makes it particularly useful for real-world applications.
QCP-FB provides more accurate formant estimates by analyzing speech signals in both forward and backward directions. This makes it particularly effective for refining formant predictions in noisy or challenging conditions.
Researchers can integrate the refined DeepFormants tracker into their speech processing pipelines to improve formant tracking accuracy. The approach is easy to implement, as it requires no additional training or data.
Improved formant tracking can enhance applications like speech synthesis, voice pathology detection, and linguistic research. It is also valuable for developing robust speech recognition systems in noisy environments.
Show by month | Manuscript | Video Summary |
---|---|---|
2025 April | 8 | 8 |
2025 March | 79 | 79 |
2025 February | 51 | 51 |
2025 January | 38 | 38 |
2024 December | 47 | 47 |
2024 November | 55 | 55 |
2024 October | 43 | 43 |
2024 September | 41 | 41 |
2024 August | 30 | 30 |
2024 July | 31 | 31 |
2024 June | 23 | 23 |
2024 May | 28 | 28 |
2024 April | 25 | 25 |
2024 March | 6 | 6 |
Total | 505 | 505 |
Show by month | Manuscript | Video Summary |
---|---|---|
2025 April | 8 | 8 |
2025 March | 79 | 79 |
2025 February | 51 | 51 |
2025 January | 38 | 38 |
2024 December | 47 | 47 |
2024 November | 55 | 55 |
2024 October | 43 | 43 |
2024 September | 41 | 41 |
2024 August | 30 | 30 |
2024 July | 31 | 31 |
2024 June | 23 | 23 |
2024 May | 28 | 28 |
2024 April | 25 | 25 |
2024 March | 6 | 6 |
Total | 505 | 505 |