On 10 Mar 2005 09:54:16 -0800, jj.shine@verizon.net wrote:
>We're looking for software that will convert existing mp3 and wav files
>to a text/doc format.
>
>We've checked out the dictation type products but don't see exactly
>what we're looking to do. Any ideas?
>
>John
Dragon Naturaly Speaking (and others)Speach to Text need to be
'trained' to the speakers voice first.
MP3 encoding/playback would probably be LOTS different than a WAV file
playback and would probably would be regarded as a different speaker.
, _
, | \ MKA: Steve Urbach
, | )erek No JUNK in my email please
, ____|_/ragonsclaw dragonsclawJUNK@JUNKmindspring.com
, / / / Running United Devices "Cure For Cancer" Project 24/7 Have you helped? http://www.grid.org
> We're looking for software that will convert existing mp3 and wav files
> to a text/doc format.
>
> We've checked out the dictation type products but don't see exactly
> what we're looking to do. Any ideas?
I'm confused. How would you translate a wav file to a text/doc file?
Perhaps it's relevent to state the higher-level requirements of what it
is you're attempting to do.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124
"Randy Yates" wrote ...
> I'm confused. How would you translate a wav file to a text/doc file?
There are several products that do speech recognition. Primarily
for live speech (i.e. voice-command, etc.) Technically one could
substitute recorded speech (rather than from a live microphone)
as the input for one of these applications.
However, speech-to-text seems pretty flaky under ideal conditions.
Not sure you could do it reliably with speech compressed to MP3.
> Perhaps it's relevent to state the higher-level requirements of what it
> is you're attempting to do.
> Perhaps it's relevent to state the higher-level requirements of what
it
> is you're attempting to do.
Here's an explanation -
We have an inventory of audio training presentations on file - wav and
mp3 formats - we are looking for a way to "convert" the narrative from
these files into a text format that we can then add to appropriate
Powerpoint slides - in the "notes" function.
> Thanks for all the responses...re:
>
> > Perhaps it's relevent to state the higher-level requirements of what
> it
> > is you're attempting to do.
>
> Here's an explanation -
>
> We have an inventory of audio training presentations on file - wav and
> mp3 formats - we are looking for a way to "convert" the narrative from
> these files into a text format that we can then add to appropriate
> Powerpoint slides - in the "notes" function.
OK, it became clear as soon as I posted. I don't know why the "send" button
always infuses me with a couple more dB if IQ.
Richard Crowley wrote:
> However, speech-to-text seems pretty flaky under ideal conditions.
> Not sure you could do it reliably with speech compressed to MP3.
>
>
I really wouldn't worry about using MP3s; have you any idea of the
quality of the hardware supplied for normal speech to text applications?
To call them microphones is _really_ stretching a point!!
"Andrew Chesters" wrote ...
> Richard Crowley wrote:
> > However, speech-to-text seems pretty flaky under ideal conditions.
> > Not sure you could do it reliably with speech compressed to MP3.
> >
> >
> I really wouldn't worry about using MP3s; have you any idea of the
> quality of the hardware supplied for normal speech to text applications?
> To call them microphones is _really_ stretching a point!!
But the weak point isn't really the quality of the hardware, is it?
A quite acceptable (i.e. reasonably low-distortion, high signal-to-noise
ratio) signal can be generated with very cheap hardware.
The big issue seems to be decoding the phonemes and interpereting
them into words. Some languages (likely English) are more problematic
than others (because of irregular spelling, irregular pronunciation,
homonyms, etc.) and the normal way we normally run syllables/words
together, not to mention speakers with accents, etc.
It is the decoding the phonemes part that may not fare well with
compressed data (like MP3, etc.) because the compression may
remove or mask some of the cues that the recognition software
needs to decipher the sounds. Software can't necessarily use the
same algorithms our ears use (even if we knew what they were.)
"Richard Crowley" <richard.7.crowley@intel.com> wrote in message
news0qgdk$hfi$1@news01.intel.com
> "Andrew Chesters" wrote ...
>> Richard Crowley wrote:
>>> However, speech-to-text seems pretty flaky under ideal conditions.
>>> Not sure you could do it reliably with speech compressed to MP3.
>>>
>>>
>> I really wouldn't worry about using MP3s; have you any idea of the
>> quality of the hardware supplied for normal speech to text
>> applications? To call them microphones is _really_ stretching a
>> point!!
>
> But the weak point isn't really the quality of the hardware, is it?
> A quite acceptable (i.e. reasonably low-distortion, high
> signal-to-noise ratio) signal can be generated with very cheap
> hardware.
But, it often doesn't work out that way.
> The big issue seems to be decoding the phonemes and interpereting
> them into words. Some languages (likely English) are more problematic
> than others (because of irregular spelling, irregular pronunciation,
> homonyms, etc.) and the normal way we normally run syllables/words
> together, not to mention speakers with accents, etc.
> It is the decoding the phonemes part that may not fare well with
> compressed data (like MP3, etc.) because the compression may
> remove or mask some of the cues that the recognition software
> needs to decipher the sounds. Software can't necessarily use the
> same algorithms our ears use (even if we knew what they were.)
Interesting speculation - got any evidence to back it up?
I did a little searching and found no caveats about speech recogntion from
MP3s.
The problem, as it was already stated, is that all (of those i know of) of
the speech recognition programs on the market are mostly sound-comparative
meaning they compare what they "hear" to a dictionary of sounds trained with
the speaker's voice... Dragon does a very good job if trained properly, but
it has to be trained from the same "voice" it will be listening to, as all
other speech recognition programs.
Now if all your recordings are narrated by the same person, and this person
uses a pretty neutral voice, without too many intonations, what you would
need to do is whip out a good sound file editor, like sonic foundry sound
forge or something, and use words from you recordings to train the software.
For example, the software is going to ask you to record a series or words
when you first install it, and it will use those words as reference for
phonetic translation. If, instead of recording those words with your own
voice, you create a sound file from your recordings for each word it asks
you (cutting/pasting), and use these as reference, the software should
recognize properly most of your recordings.
But since this would be a very time consuming thing to do, and since you
would STILL have to go back and read/correct the text, it would probably be
easier to just hire a typer, and have him/her type everything as it goes....
I do not think MP3 compression is an issue there, as MP3's way of doing
things is simply stripping off stuff you normally wouldn't hear anyways, or
wouldn't need or care to hear from a compressed audio file (harmonics,
background noise, out of band frequencies, etc...). Unless you have really
poor quality recordings with lots of noise, or bad voice to noise volume
ratio, I don't think you would get better results from from a WAV file than
from an MP3... And yes the key here is good recording hardware...
While on the subject, does anybody know if non-comparative based speech
recognition software exists? My idea is to develop a program that would
translate spoken words STRAIGHT to text, without comparing spoken words to a
recorded reference. The program would listen to spoken syllabs, and
associate them to a written phonetic translation.(i.e.: "potato" would
translate something like pô - tae - toe) From there, a modified
syntax/vocabulary dictionnary (that would have the phonetic pronunciations
included) would find the closest phonetic match and return it. It WOULD need
a lot more processing power than the average speech to text app, but the
advantages of such a system would be: 1- No dependency towards who speaks
(accents, tone, etc...) 2- More "correct" translation 3- Better portability
(recognizing spanish would just be a matter of switching dictionaries, not
recording a whole new set of references...)
Any feedback on that would be appreciated...
Hugo
"Arny Krueger" <arnyk@hotpop.com> wrote in message
news:WJudnaL9SZsnFazfRVn-3A@comcast.com...
> "Richard Crowley" <richard.7.crowley@intel.com> wrote in message
> news0qgdk$hfi$1@news01.intel.com
> > "Andrew Chesters" wrote ...
> >> Richard Crowley wrote:
> >>> However, speech-to-text seems pretty flaky under ideal conditions.
> >>> Not sure you could do it reliably with speech compressed to MP3.
> >>>
> >>>
> >> I really wouldn't worry about using MP3s; have you any idea of the
> >> quality of the hardware supplied for normal speech to text
> >> applications? To call them microphones is _really_ stretching a
> >> point!!
> >
> > But the weak point isn't really the quality of the hardware, is it?
> > A quite acceptable (i.e. reasonably low-distortion, high
> > signal-to-noise ratio) signal can be generated with very cheap
> > hardware.
>
> But, it often doesn't work out that way.
>
> > The big issue seems to be decoding the phonemes and interpereting
> > them into words. Some languages (likely English) are more problematic
> > than others (because of irregular spelling, irregular pronunciation,
> > homonyms, etc.) and the normal way we normally run syllables/words
> > together, not to mention speakers with accents, etc.
>
> > It is the decoding the phonemes part that may not fare well with
> > compressed data (like MP3, etc.) because the compression may
> > remove or mask some of the cues that the recognition software
> > needs to decipher the sounds. Software can't necessarily use the
> > same algorithms our ears use (even if we knew what they were.)
>
> Interesting speculation - got any evidence to back it up?
>
> I did a little searching and found no caveats about speech recogntion from
> MP3s.
>
>
You are about to answer a thread that has been inactive for more than 6 months. If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.