Apple granted patent for 'speech recognition using latent semantic adaptation'
TweetFollow Us on Twitter

Apple granted patent for 'speech recognition using latent semantic adaptation'

The present invention relates generally to pattern recognition. More particularly, this invention relates to speech recognition systems using latent semantic analysis. Here's Apple's background on the invention: "As computer systems have evolved, the desire to use such systems for pattern recognition has grown. Typically, the goal of pattern recognition systems is to quickly provide accurate recognition of input patterns. One type of pattern recognition system is a voice recognition system, which attempts to accurately identify a user's speech. Another type of pattern recognition is a handwriting recognition system. A speech recognizer discriminates among acoustically similar segments of speech to recognize words, while a handwriting recognizer discriminates among strokes of a pen to recognize words.

"An important advancement in speech recognition technology is the use of semantic pattern recognition known as semantic language modeling. Semantic language modeling uses the context of the spoken words to decide which words are most likely to appear next, the context referring to the domain or subject matter of the words as well as the style. For example, a speech recognition application using semantic language modeling will favor the word sequence "recognize speech" over "wreck a nice beach" when the subject matter is speech processing, and vice versa when the subject matter has to do with vacations at the beach.

"In semantic language modeling, the domain and style of the spoken words is captured using latent semantic analysis (LSA). LSA is a modification of a paradigm that was first formulated in the context of information retrieval and reveals meaningful associations in language based on semantic patterns previously observed in a corpus of language representative of a particular domain and style, for example, a training corpus having to do with speech processing vs. vacations at the beach. The semantic patterns are word-document co-occurrences that appear in the training corpus, where the corpus is comprised of a collection of one or more documents that contain paragraphs and sentences or other collections of words representative of the domain and style.

"The semantic knowledge represented by the semantic patterns is encapsulated in a continuous vector space, referred to as the LSA space, by mapping those word-document co-occurrences into corresponding word and document vectors that characterize the position of the words and documents in the LSA space. During speech recognition, any new words or documents are first mapped onto a point in the LSA space, and then compared to the existing word and document vectors in the space using a similarity measure, a process referred to as semantic inference. Those new words and documents that map most closely to the existing word and document vectors in the LSA space are recognized over those that do not.

"A limitation in current implementations of speech recognition applications using semantic language modeling is that the LSA space is a fixed semantic space. This means that semantic patterns not observed in the training corpus cannot be captured and later exploited during speech recognition. As a result, changes in the domain of the speech, or even just changes in the style of the speech, may not be properly recognized. In the case of financial news, for example, this means that an LSA-based speech recognition application trained on a collection of documents, say, from the Wall Street Journal, will not perform optimally on new documents from the Associated Press, and vice versa. The use of a fixed semantic space is particularly deleterious in applications with many heterogeneous domains, such as an information retrieval system, since no database is big enough to contain a training corpus representative of all domains. It is also less than ideal for horizontal (i.e. non-specialized) dictation applications, because the same user typically adopts different styles in different contexts, for example the formal style of a business letter vs. the informal style of a personal letter.

"Distributed training seeks to overcome some of the limitations of a fixed semantic space by creating a distinct semantic space for each usage condition. Thus, using the financial news example, there would be one LSA space for the Wall Street Journal, and another LSA space for the Associated Press. However, it is often impossible to predict ahead of time which kind(s) of text the end user will want to process, and even when that can be done, for most narrowly defined contexts and styles it may be challenging to gather enough data to reliably train the speech recognition system.

"Explicit modeling also seeks to overcome some of the limitations of a fixed semantic space by including a task (i.e. domain) and/or style component into the LSA paradigm ... Another approach to the problem of a fixed semantic space is to re-compute the LSA space to account for the new words and documents as they become available. One way is simply to re-compute the LSA space from scratch, referred to as full re-computation. Another way is to re-compute the LSA space from scratch, but keeping the dimension of the LSA space constant, referred to as constant dimension re-computation. But full or constant dimension re-computation requires significant additional processing. The additional processing is undesirable since it consumes additional central processor unit (CPU) cycles and degrades responsiveness.

"Yet another approach to the problem of a fixed semantic space is to adapt the LSA space to account for the new documents and new words in the new documents as they become available by using traditional 'folding-in' to incorporate new variants in the existing LSA space, referred to as baseline adaptation. While less computationally intensive, baseline adaptation results in speech misclassification error rates of unacceptably high levels. What is needed, therefore, is an improved method and apparatus for using semantic language modeling in a speech recognition system to more accurately recognize speech."

The patent was originally filed on Sept. 28, 2001. The inventor is Jerome R. Bellegarda. FIG. 1 is a block diagram that illustrates the use of latent semantic adaptation in the context of a speech recognition system using semantic inference, in accordance with one embodiment of the present invention

image

 
AAPL
$475.87
Apple Inc.
+7.04
GOOG
$609.86
Google Inc.
+3.09
MSFT
$30.62
Microsoft Corpora
+0.27
MacNews Search:
Community Search:

Social And Location Aware News With Arou...
Regardless of the location, there’s bound to be something interesting going on somewhere. AroundNow seeks to provide an easy way of seeing exactly what’s going on locally at any time. | Read more »
Royal Trouble: Hidden Adventures Review
Royal Trouble: Hidden Adventures Review By Jennifer Allen on February 8th, 2012 Our Rating: :: CASUAL MYSTERYiPad Only App - Designed for the iPad A lighthearted casual adventure gaming experience that’s a small step up in challenge from the hidden object genre.   | Read more »
Favorite Four Apps For Valentine’s Day
Ah, Valentine’s day. That wonderful day where those in relationships set huge expectations for perfect romantic escapades that can seldom be met by their partner and singles wish they could share in the homage to Hallmark and Cupid. Finding the right thoughtful token or date locale doesn’t have to be an expensive ordeal, however. As always there... | Read more »
AWESOME Land Review
AWESOME Land Review By Jason Wadsworth on February 8th, 2012 Our Rating: :: RETRO REMIXUniversal App - Designed for iPhone and iPad An etherial homage to the 16-bit platformers of days gone by.   Developer: FreakZone | Read more »
Workout Companion, iMuscle, Updates with...
Was your New Year’s resolution to get back in shape? The iPad and iPhone can be great workout companions, especially with apps like iMuscle from 3D4Medical.com. iMuscle is a workout aid that can be used to find exercises that coincide with specific muscles in the body. The muscles are displayed in a visually appealing 3-D view that the user can... | Read more »
The Tower of Zarbartz Review
The Tower of Zarbartz Review By Jason Wadsworth on February 8th, 2012 Our Rating: :: FUN WITH LIQUIDSiPhone App - Designed for the iPhone, compatible with the iPad Use strategically placed blocks to control the flow of various liquids and get them to the right container in this liquified puzzle game.   | Read more »
Space Junk Review
Space Junk Review By Carter Dotson on February 8th, 2012 Our Rating: :: NOT JUNKUniversal App - Designed for iPhone and iPad Space Junk is an Asteroids-inspired vector-graphics shooter, where players must thrust around in space, shooting debris and UFOs.   | Read more »
American Express Launches Departures: Ul...
American Express has launched an app for members and subscribers to their magazine, Departures, to help them find the best shopping, eating, drinking, and hotels in a variety of cities around the world with Departures Ultimate City Guides. | Read more »
Avid Studio Review
Avid Studio Review By Jennifer Allen on February 8th, 2012 Our Rating: :: DETAILEDiPad Only App - Designed for the iPad A powerful app for editing videos while on the move.   | Read more »
Face Fun Flips Friends’ Faces for Fun
A photography app for switching around two faces in a photograph automatically, without any necessary configuration or input from the user? Sounds unlikely, right? Well, there are many, many apps out there for transforming our faces. Ones to make our faces fatter, ones to make our faces skinnier, Photo Booth is on the iPad for all kinds of crazy... | Read more »
All contents are Copyright 1984-2010 by Xplain Corporation. All rights reserved. Theme designed by Icreon.