Bibliographic Metadata

Title
Deep Neural Networks für Spracherkennung
Additional Titles
Deep Neural Networks for Speech Recognition
AuthorEnsor, Alice
Thesis advisorSchefer-Wenzl, Sigrid
Published2018
Date of SubmissionMarch 2018
LanguageEnglish
Document typeBachelor Thesis
Keywords (DE)Nicht verfügbar
Keywords (EN)Artificial Neural Network / Automatic Speech Recognition / Acoustic Model / Activation Functions / Backpropagation / Convolutional Neural Network / Cost Functions / Deep Neural Networks / Feedforward Neural Networks / Gradient Descent / Hidden Markov Model / Language Model / Long Short-Term Memory / Markov Chains / Multilayer Perceptrons / Pooling / Recurrent Neural Networks / Rectified Linear Unit / Sigmoidal Non-Linear Activation Function / Word Error Rate
Restriction-Information
 _
Classification
Abstract (German)

Nicht verfügbar

Abstract (English)

This bachelor thesis examines deep neural networks (DNNs) and their use in automatic speech recognition (ASR) systems. This is done to evaluate the feasibility of implementing such ASR systems in educational applications to help children with reading difficulties. To aid these children, educators need interactive software that employs automatic speech recognition (ASR) to provide necessary feedback to a child’s reading attempts. However, most reading applications used in schools employ outdated ASR based on statistical models which is unable to reach the level of accuracy seen in DNN-based speech recognition found in commercial applications.

Most ASR systems used with educational reading software employ a Gaussian Mixture Model (GMM) with a Hidden Markov Model (HMM) to perform acoustic modeling. One of the problems with GMMs is that they have difficulty modeling data constructs found on nonlinear manifolds, which is common with speech data. DNNs are more effective at modeling such constructs. This means the accuracy of ASR systems can be vastly improved by replacing the GMMs with DNNs. Furthermore, because of long short-term memory (LSTM) networks’ ability to represent temporal constructs and the existence of large libraries of labelled speech data, it is even possible to have DNNs replace the entire ASR pipeline. Research has shown that both above configurations outperform GMM-HMM ASR with the same amount of training data, with many DNN ASR systems even approaching parity with human transcribers. The goal of this paper is to examine published research and academic works to provide a theoretical background for implementing an educational reading application with DNN-based ASR. This application will be realized in the second part of this author’s bachelor thesis.