bagustris@/home

Showing posts with label nkululeko. Show all posts

Thursday, August 29, 2024

Pathological Voice Detection with Nkululeko

I tried to add a recipe to Nkululeko for pathological voice detection using the Saarbrueken Voice Database (SVD, the dataset currently perhaps cannot be downloaded due to a server problem). Using Nkululeko is easy; I just need two working days (with a lot of play and others) to process the data and get the initial result. The evaluation was mainly measured with F1-Score (Macro). The initial result is an F1-score of 71% obtained using open smile ('os') and SVM without any customization.

On the third day, I made some modifications, but I am still using the 'os' feature (so I don't need to extract the feature again, using the same 'exp' directory to boost my experiments). The best F1 score I achieved was 76% (macro). This time, the modifications were: without feature scaling, feature balancing with smote, and using XGB as a classifier. My goal is to obtain an F1 score of 80% before the end of the fiscal year (March 2025).

Here is the configuration (INI) file and a sample of outputs (terminal).

Configuration file:

[EXP]
root = /tmp/results/
name = exp_os
[DATA]
databases = ['train', 'dev', 'test']
train = ./data/svd/svd_a_train.csv
train.type = csv
train.absolute_path = True
train.split_strategy = train
train.audio_path = /data/SaarbrueckenVoiceDatabase/export_16k
dev = ./data/svd/svd_a_dev.csv
dev.type = csv
dev.absolute_path = True
dev.split_strategy = train
dev.audio_path = /data/SaarbrueckenVoiceDatabase/export_16k
test = ./data/svd/svd_a_test.csv
test.type = csv
test.absolute_path = True
test.split_strategy = test
test.audio_path = /data/SaarbrueckenVoiceDatabase/export_16k
target = label
; no_reuse = True
; labels = ['angry', 'calm', 'sad']
; get the number of classes from the target column automatically
[FEATS]
; type = ['wav2vec2']
; type = ['hubert-large-ll60k']
; type = []
type = ['os']
; scale = standard
balancing = smote
; no_reuse = True
[MODEL]
type = xgb

Outputs:

$ python3 -m nkululeko.nkululeko --config exp_svd/exp_os.ini 
DEBUG: nkululeko: running exp_os from config exp_svd/exp_os.ini, nkululeko version 0.88.12
DEBUG: dataset: loading train
DEBUG: dataset: Loaded database train with 1650 samples: got targets: True, got speakers: False (0), got sexes: False, got age: False
DEBUG: dataset: converting to segmented index, this might take a while...
DEBUG: dataset: loading dev
DEBUG: dataset: Loaded database dev with 192 samples: got targets: True, got speakers: False (0), got sexes: False, got age: False
DEBUG: dataset: converting to segmented index, this might take a while...
DEBUG: dataset: loading test
DEBUG: dataset: Loaded database test with 190 samples: got targets: True, got speakers: False (0), got sexes: False, got age: False
DEBUG: dataset: converting to segmented index, this might take a while...
DEBUG: experiment: target: label
DEBUG: experiment: Labels (from database): ['n', 'p']
DEBUG: experiment: loaded databases train,dev,test
DEBUG: experiment: reusing previously stored /tmp/results/exp_os/./store/testdf.csv and /tmp/results/exp_os/./store/traindf.csv
DEBUG: experiment: value for type is not found, using default: dummy
DEBUG: experiment: Categories test (nd.array): ['n' 'p']
DEBUG: experiment: Categories train (nd.array): ['n' 'p']
DEBUG: nkululeko: train shape : (1842, 3), test shape:(190, 3)
DEBUG: featureset: value for n_jobs is not found, using default: 8
DEBUG: featureset: reusing extracted OS features: /tmp/results/exp_os/./store/train_dev_test_os_train.pkl.
DEBUG: featureset: value for n_jobs is not found, using default: 8
DEBUG: featureset: reusing extracted OS features: /tmp/results/exp_os/./store/train_dev_test_os_test.pkl.
DEBUG: experiment: All features: train shape : (1842, 88), test shape:(190, 88)
DEBUG: experiment: scaler: False
DEBUG: runmanager: value for runs is not found, using default: 1
DEBUG: runmanager: run 0 using model xgb
DEBUG: modelrunner: balancing the training features with: smote
DEBUG: modelrunner: balanced with: smote, new size: 2448 (was 1842)
DEBUG: modelrunner: {'n': 1224, 'p': 1224})
DEBUG: model: value for n_jobs is not found, using default: 8
DEBUG: modelrunner: value for epochs is not found, using default: 1
DEBUG: modelrunner: run: 0 epoch: 0: result: test: 0.771 UAR
DEBUG: modelrunner: plotting confusion matrix to train_dev_test_label_xgb_os_balancing-smote_0_000_cnf
DEBUG: reporter: Saved confusion plot to /tmp/results/exp_os/./images/run_0/train_dev_test_label_xgb_os_balancing-smote_0_000_cnf.png
DEBUG: reporter: Best score at epoch: 0, UAR: .77, (+-.704/.828), ACC: .773
DEBUG: reporter: 
               precision    recall  f1-score   support

           n       0.65      0.76      0.70        67
           p       0.86      0.78      0.82       123

    accuracy                           0.77       190
   macro avg       0.76      0.77      0.76       190
weighted avg       0.79      0.77      0.78       190

DEBUG: reporter: labels: ['n', 'p']
DEBUG: reporter: Saved ROC curve to /tmp/results/exp_os/./results/run_0/train_dev_test_label_xgb_os_balancing-smote_0_roc.png
DEBUG: reporter: auc: 0.771, pauc: 0.560 from epoch: 0
DEBUG: reporter: result per class (F1 score): [0.703, 0.817] from epoch: 0
DEBUG: experiment: Done, used 11.065 seconds
DONE

Update 2024/08/30:

Using praat + xgb (no scaling, balancing: smote) achieves higher F1-score, i.e., 78% (macro) and 79% (weighted)

Wednesday, July 31, 2024

New pre-print: Uncertainty-Based Ensemble Learning For Speech Classification

Last week, I submitted a new pre-print to Arxiv. The link is below.

https://arxiv.org/abs/2407.17009

This paper discusses the importance of accommodating uncertainty in addressing the issue of different models in speech classification. We used several variants of uncertainty-based ensemble learning (aka late fusions) to combine predictions of machine learning methods.

Every machine learning models has uncertainty, which, for instance, can be calculated from logits or probabilities. Entropy can be calculated from distribution. This distribution is multiplication of logits with softmax. Then, we can use several models to find the lowest uncertainty score and infer the label from that score. Or, we can weigh the probabilities by those uncertainty scores. We evaluated four variants of uncertainty-based ensemble learning.

Our work is highly beneficial to the community as it will help you improve the prediction of speech classification using ensemble learning.

Some areas that still need further development are stability and generalization. We see the trends that ensemble learning tends to improve the recognition rate of different models, but it is not always the case. We invite readers to collaborate in addressing the various unanswered questions.

We extend our thanks to AIST for their full support of our research.

Happy reading. We welcome your feedback.

Additional information:

Toolkit used in the research: Nkululeko https://github.com/felixbur/nkululeko

Configuration files: https://github.com/bagustris/nkululeko_ensemble_speech_classficcation

---

This post is based on template from [1].

Reference:

[1] https://medium.com/open-science-indonesia/template-memajang-makalah-di-medsos-190c228dd86a