wav2vec Unsupervised: Speech recognition without supervision (facebook.com)
- 페이스북 AI팀이 만든 음성인식 프레임워크
- 전사(transcribed) 음성 데이터 없이 다양한 언어 인식을 지원
ㅤ→ 1000시간 정도 분량의 음성으로 훈련된 지도학습 모델과 비슷한 성능
ㅤ→ 전사 음성 데이터가 많지 않은 스와힐리어/타타르 언어등으로 테스트
- 레이블링 되지 않은 오디오의 구조를 학습하는 방식
ㅤ→ 음성 녹음을 각각의 사운드에 느슨하게 대응하는 음성 단위로 분할
ㅤ→ cat 은 “/K/”, “/AE/” “/T/“ 세개의 소리가 포함
ㅤ→ generator 와 discriminator 로 구성된 GAN 으로 훈련
- 코드와 논문 공개
Detect language Afrikaans Albanian Amharic Arabic Armenian Azerbaijani Basque Belarusian Bengali Bosnian Bulgarian Catalan Cebuano Chichewa Chinese (Simplified) Chinese (Traditional) Corsican Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Frisian Galician Georgian German Greek Gujarati Haitian Creole Hausa Hawaiian Hebrew Hindi Hmong Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Javanese Kannada Kazakh Khmer Korean Kurdish Kyrgyz Lao Latin Latvian Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Mongolian Myanmar (Burmese) Nepali Norwegian Pashto Persian Polish Portuguese Punjabi Romanian Russian Samoan Scots Gaelic Serbian Sesotho Shona Sindhi Sinhala Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tajik Tamil Telugu Thai Turkish Ukrainian Urdu Uzbek Vietnamese Welsh Xhosa Yiddish Yoruba Zulu
Afrikaans Albanian Amharic Arabic Armenian Azerbaijani Basque Belarusian Bengali Bosnian Bulgarian Catalan Cebuano Chichewa Chinese (Simplified) Chinese (Traditional) Corsican Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Frisian Galician Georgian German Greek Gujarati Haitian Creole Hausa Hawaiian Hebrew Hindi Hmong Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Javanese Kannada Kazakh Khmer Korean Kurdish Kyrgyz Lao Latin Latvian Lithuanian Luxembourgish Macedonian Malagasy Malay Malayalam Maltese Maori Marathi Mongolian Myanmar (Burmese) Nepali Norwegian Pashto Persian Polish Portuguese Punjabi Romanian Russian Samoan Scots Gaelic Serbian Sesotho Shona Sindhi Sinhala Slovak Slovenian Somali Spanish Sundanese Swahili Swedish Tajik Tamil Telugu Thai Turkish Ukrainian Urdu Uzbek Vietnamese Welsh Xhosa Yiddish Yoruba Zulu
Text-to-speech function is limited to 200 characters