파이토치 한국 사용자 모임 (PyTorch Korea User Group)

# this assumes that you have a proper version of PyTorch already installed
pip install -q torchaudio soundfile

import torch
torch.set_num_threads(1)
from pprint import pprint
# download example
torch.hub.download_url_to_file('https://models.silero.ai/vad_models/de.wav', 'de_example.wav')

model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_lang_detector',
                              force_reload=True)

get_language, read_audio, *_ = utils

files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'

wav = read_audio('de_example.wav')
language = get_language(wav, model)

pprint(language)

Model Description

Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier (95 languages). Enterprise-grade Speech Products made refreshingly simple (see our STT models). Each model is published separately.

Currently, there are hardly any high quality / modern / free / public voice activity detectors except for WebRTC Voice Activity Detector (link). WebRTC though starts to show its age and it suffers from many false positives.

(!!!) Important Notice (!!!) - the models are intended to run on CPU only and were optimized for performance on 1 CPU thread. Note that the model is quantized.

Additional Examples and Benchmarks

For additional examples and other model formats please visit this link and please refer to the extensive examples in the Colab format (including the streaming examples).

References

Language classifier model architecture is based on similar STT architectures.

Silero Language Classifier

Model Description

Additional Examples and Benchmarks

References

PyTorchKorea @ GitHub

한국어 튜토리얼

커뮤니티