Meniu Referate
Romana
Romana1
Romana2
Istorie
Istorie1
Geografie
Geografie1
Diverse
Drept
Economie
Filozofie
Fizica
Informatica
Biologie
Chimie
Italiana
Spaniola
Germana
Franceza
Engleza
Marketing
Matematica
Medicina
Psihologie
Astronomie
Stiinte Politice
Proiecte

English as anexpanding language - Spoken Language Corpora Activities

...s require large corpora of human-machine conversations to model interactive dialogue. In response to this need, there are major efforts underay orldide to collect, annotate and distribute speech corpora in many languages. These corpora allo scientists to study, understand, and model the different sources of variability, and to develop, evaluate and compare speech technologies on a common basis. Spoken Language Corpora Activities Recent advances in speech and language recognition are due in part to the availability of large public domain speech corpora, hich have enabled comparative system evaluation using shared testing protocols. The use of common corpora for developing and evaluating speech recognition algorithms is a fairly recent development. One of first corpora used for common evaluation, the TI-DIGITS corpus, recorded in 1984, has been and still is idely used as a test base for isolated and connected digit recognition Challenges in spoken language corpora are many. One basic challenge is in design methodology---ho to design compact corpora that can be used in a variety of applications ho to design comparable corpora in a variety of languages ho to select or sample speakers so as to have a representative population ith regard to many factors including accent, dialect, and speaking style ho to create generic dialogue corpora so as to minimize the need for task or application specific data ho to select statistically representative test data for system evaluation. Another major challenge centers on developing standards for transcribing speech data at different levels and across languages establishing symbol sets, alignment conventions, defining levels of transcription acoustic, phonetic, phonemic, ord and other levels, conventions for prosody and tone, conventions for quality control such as having independent labelers transcribe the same speech data for reliability statistics. Quality control of the speech data is also an important issue that needs to be addressed, as ell as methods for dissemination. hile CDROM has become the defacto standard for dissemination of large corpora, other potential means need to also be considered, such as very high speed fiber optic netorkssi0JCJaJCJaJajUUxSa 1h à!Ddc3ZDUdVaP 6English as anexpanding languageArial BlackIvH!5ect0-IRNUia8i8NormalCJsHaJmHsHtHFai2FHea
ding 3addiit5CJtaJAiDefault Paragraph Fon...
Download