Sign in with
Sign up | Sign in
Your question

Audio files to text converter ?

Last response: in Home Audio
Share
Anonymous
March 10, 2005 12:54:16 PM

Archived from groups: rec.audio.tech (More info?)

We're looking for software that will convert existing mp3 and wav files
to a text/doc format.

We've checked out the dictation type products but don't see exactly
what we're looking to do. Any ideas?

John
March 10, 2005 3:37:27 PM

Archived from groups: rec.audio.tech (More info?)

Thanks for all the responses...re:

> Perhaps it's relevent to state the higher-level requirements of what
it
> is you're attempting to do.

Here's an explanation -

We have an inventory of audio training presentations on file - wav and
mp3 formats - we are looking for a way to "convert" the narrative from
these files into a text format that we can then add to appropriate
Powerpoint slides - in the "notes" function.

John
Anonymous
March 10, 2005 5:30:39 PM

Archived from groups: rec.audio.tech (More info?)

jj.shine@verizon.net writes:

> We're looking for software that will convert existing mp3 and wav files
> to a text/doc format.
>
> We've checked out the dictation type products but don't see exactly
> what we're looking to do. Any ideas?

I'm confused. How would you translate a wav file to a text/doc file?

Perhaps it's relevent to state the higher-level requirements of what it
is you're attempting to do.
--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124
Related resources
Anonymous
March 10, 2005 5:30:40 PM

Archived from groups: rec.audio.tech (More info?)

"Randy Yates" wrote ...
> I'm confused. How would you translate a wav file to a text/doc file?

There are several products that do speech recognition. Primarily
for live speech (i.e. voice-command, etc.) Technically one could
substitute recorded speech (rather than from a live microphone)
as the input for one of these applications.

However, speech-to-text seems pretty flaky under ideal conditions.
Not sure you could do it reliably with speech compressed to MP3.

> Perhaps it's relevent to state the higher-level requirements of what it
> is you're attempting to do.

Indeed.
Anonymous
March 10, 2005 6:44:04 PM

Archived from groups: rec.audio.tech (More info?)

"John" <jj.shine@verizon.net> writes:

> Thanks for all the responses...re:
>
> > Perhaps it's relevent to state the higher-level requirements of what
> it
> > is you're attempting to do.
>
> Here's an explanation -
>
> We have an inventory of audio training presentations on file - wav and
> mp3 formats - we are looking for a way to "convert" the narrative from
> these files into a text format that we can then add to appropriate
> Powerpoint slides - in the "notes" function.

OK, it became clear as soon as I posted. I don't know why the "send" button
always infuses me with a couple more dB if IQ.

You might want to try Googling on the following

speech-to-text
speech recognition
speaker-indepent recognition

--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124
Anonymous
March 10, 2005 10:05:59 PM

Archived from groups: rec.audio.tech (More info?)

On 10 Mar 2005 09:54:16 -0800, jj.shine@verizon.net wrote:

>We're looking for software that will convert existing mp3 and wav files
>to a text/doc format.
>
>We've checked out the dictation type products but don't see exactly
>what we're looking to do. Any ideas?
>
>John
Dragon Naturaly Speaking (and others)Speach to Text need to be
'trained' to the speakers voice first.
MP3 encoding/playback would probably be LOTS different than a WAV file
playback and would probably would be regarded as a different speaker.

, _
, | \ MKA: Steve Urbach
, | )erek No JUNK in my email please
, ____|_/ragonsclaw dragonsclawJUNK@JUNKmindspring.com
, / / / Running United Devices "Cure For Cancer" Project 24/7 Have you helped? http://www.grid.org
Anonymous
March 11, 2005 12:37:11 AM

Archived from groups: rec.audio.tech (More info?)

Richard Crowley wrote:
> However, speech-to-text seems pretty flaky under ideal conditions.
> Not sure you could do it reliably with speech compressed to MP3.
>
>
I really wouldn't worry about using MP3s; have you any idea of the
quality of the hardware supplied for normal speech to text applications?
To call them microphones is _really_ stretching a point!!
Anonymous
March 11, 2005 12:37:12 AM

Archived from groups: rec.audio.tech (More info?)

"Andrew Chesters" wrote ...
> Richard Crowley wrote:
> > However, speech-to-text seems pretty flaky under ideal conditions.
> > Not sure you could do it reliably with speech compressed to MP3.
> >
> >
> I really wouldn't worry about using MP3s; have you any idea of the
> quality of the hardware supplied for normal speech to text applications?
> To call them microphones is _really_ stretching a point!!

But the weak point isn't really the quality of the hardware, is it?
A quite acceptable (i.e. reasonably low-distortion, high signal-to-noise
ratio) signal can be generated with very cheap hardware.

The big issue seems to be decoding the phonemes and interpereting
them into words. Some languages (likely English) are more problematic
than others (because of irregular spelling, irregular pronunciation,
homonyms, etc.) and the normal way we normally run syllables/words
together, not to mention speakers with accents, etc.

It is the decoding the phonemes part that may not fare well with
compressed data (like MP3, etc.) because the compression may
remove or mask some of the cues that the recognition software
needs to decipher the sounds. Software can't necessarily use the
same algorithms our ears use (even if we knew what they were.)
Anonymous
March 11, 2005 9:59:59 AM

Archived from groups: rec.audio.tech (More info?)

"Richard Crowley" <richard.7.crowley@intel.com> wrote in message
news:D 0qgdk$hfi$1@news01.intel.com
> "Andrew Chesters" wrote ...
>> Richard Crowley wrote:
>>> However, speech-to-text seems pretty flaky under ideal conditions.
>>> Not sure you could do it reliably with speech compressed to MP3.
>>>
>>>
>> I really wouldn't worry about using MP3s; have you any idea of the
>> quality of the hardware supplied for normal speech to text
>> applications? To call them microphones is _really_ stretching a
>> point!!
>
> But the weak point isn't really the quality of the hardware, is it?
> A quite acceptable (i.e. reasonably low-distortion, high
> signal-to-noise ratio) signal can be generated with very cheap
> hardware.

But, it often doesn't work out that way.

> The big issue seems to be decoding the phonemes and interpereting
> them into words. Some languages (likely English) are more problematic
> than others (because of irregular spelling, irregular pronunciation,
> homonyms, etc.) and the normal way we normally run syllables/words
> together, not to mention speakers with accents, etc.

> It is the decoding the phonemes part that may not fare well with
> compressed data (like MP3, etc.) because the compression may
> remove or mask some of the cues that the recognition software
> needs to decipher the sounds. Software can't necessarily use the
> same algorithms our ears use (even if we knew what they were.)

Interesting speculation - got any evidence to back it up?

I did a little searching and found no caveats about speech recogntion from
MP3s.
Anonymous
March 11, 2005 8:09:36 PM

Archived from groups: rec.audio.tech (More info?)

"Arny Krueger" wrote ...
> I did a little searching and found no caveats about speech
> recogntion from MP3s.

I've never seen decent speech recognition from ANY source.
Anonymous
March 11, 2005 9:58:15 PM

Archived from groups: rec.audio.tech (More info?)

The problem, as it was already stated, is that all (of those i know of) of
the speech recognition programs on the market are mostly sound-comparative
meaning they compare what they "hear" to a dictionary of sounds trained with
the speaker's voice... Dragon does a very good job if trained properly, but
it has to be trained from the same "voice" it will be listening to, as all
other speech recognition programs.

Now if all your recordings are narrated by the same person, and this person
uses a pretty neutral voice, without too many intonations, what you would
need to do is whip out a good sound file editor, like sonic foundry sound
forge or something, and use words from you recordings to train the software.
For example, the software is going to ask you to record a series or words
when you first install it, and it will use those words as reference for
phonetic translation. If, instead of recording those words with your own
voice, you create a sound file from your recordings for each word it asks
you (cutting/pasting), and use these as reference, the software should
recognize properly most of your recordings.

But since this would be a very time consuming thing to do, and since you
would STILL have to go back and read/correct the text, it would probably be
easier to just hire a typer, and have him/her type everything as it goes....

I do not think MP3 compression is an issue there, as MP3's way of doing
things is simply stripping off stuff you normally wouldn't hear anyways, or
wouldn't need or care to hear from a compressed audio file (harmonics,
background noise, out of band frequencies, etc...). Unless you have really
poor quality recordings with lots of noise, or bad voice to noise volume
ratio, I don't think you would get better results from from a WAV file than
from an MP3... And yes the key here is good recording hardware...

While on the subject, does anybody know if non-comparative based speech
recognition software exists? My idea is to develop a program that would
translate spoken words STRAIGHT to text, without comparing spoken words to a
recorded reference. The program would listen to spoken syllabs, and
associate them to a written phonetic translation.(i.e.: "potato" would
translate something like pô - tae - toe) From there, a modified
syntax/vocabulary dictionnary (that would have the phonetic pronunciations
included) would find the closest phonetic match and return it. It WOULD need
a lot more processing power than the average speech to text app, but the
advantages of such a system would be: 1- No dependency towards who speaks
(accents, tone, etc...) 2- More "correct" translation 3- Better portability
(recognizing spanish would just be a matter of switching dictionaries, not
recording a whole new set of references...)

Any feedback on that would be appreciated...

Hugo

"Arny Krueger" <arnyk@hotpop.com> wrote in message
news:WJudnaL9SZsnFazfRVn-3A@comcast.com...
> "Richard Crowley" <richard.7.crowley@intel.com> wrote in message
> news:D 0qgdk$hfi$1@news01.intel.com
> > "Andrew Chesters" wrote ...
> >> Richard Crowley wrote:
> >>> However, speech-to-text seems pretty flaky under ideal conditions.
> >>> Not sure you could do it reliably with speech compressed to MP3.
> >>>
> >>>
> >> I really wouldn't worry about using MP3s; have you any idea of the
> >> quality of the hardware supplied for normal speech to text
> >> applications? To call them microphones is _really_ stretching a
> >> point!!
> >
> > But the weak point isn't really the quality of the hardware, is it?
> > A quite acceptable (i.e. reasonably low-distortion, high
> > signal-to-noise ratio) signal can be generated with very cheap
> > hardware.
>
> But, it often doesn't work out that way.
>
> > The big issue seems to be decoding the phonemes and interpereting
> > them into words. Some languages (likely English) are more problematic
> > than others (because of irregular spelling, irregular pronunciation,
> > homonyms, etc.) and the normal way we normally run syllables/words
> > together, not to mention speakers with accents, etc.
>
> > It is the decoding the phonemes part that may not fare well with
> > compressed data (like MP3, etc.) because the compression may
> > remove or mask some of the cues that the recognition software
> > needs to decipher the sounds. Software can't necessarily use the
> > same algorithms our ears use (even if we knew what they were.)
>
> Interesting speculation - got any evidence to back it up?
>
> I did a little searching and found no caveats about speech recogntion from
> MP3s.
>
>
Anonymous
March 13, 2005 4:54:45 AM

Archived from groups: rec.audio.tech (More info?)

>I've never seen decent speech recognition from ANY source.

I've had it working quite well on a trained system with a trained
speaker.

Much less successful on input NOT carefully tailored to the system.
August 6, 2011 10:10:40 AM

Quote:
Archived from groups: rec.audio.tech (More info?)

We're looking for software that will convert existing mp3 and wav files
to a text/doc format.

We've checked out the dictation type products but don't see exactly
what we're looking to do. Any ideas?

John



you can transcribe accurately by using the software for such files .wav, .wma, wmv, ect. so i m using the transcription service company also they are good in that platform. visit Transcription Services or Audio to Text Transcriptions.
September 20, 2011 2:25:24 PM

I know this was a long time ago, but I am trying to do the exact same thing right now and am finding it really time-consuming doing it all by hand. If you ended up finding a really good technique, I would be so grateful if you would fill me in. My contact information is ktlesueur@yahoo.com

Thanks so much!




Quote:
Archived from groups: rec.audio.tech (More info?)

"John" <jj.shine@verizon.net> writes:

> Thanks for all the responses...re:
>
> > Perhaps it's relevent to state the higher-level requirements of what
> it
> > is you're attempting to do.
>
> Here's an explanation -
>
> We have an inventory of audio training presentations on file - wav and
> mp3 formats - we are looking for a way to "convert" the narrative from
> these files into a text format that we can then add to appropriate
> Powerpoint slides - in the "notes" function.

OK, it became clear as soon as I posted. I don't know why the "send" button
always infuses me with a couple more dB if IQ.

You might want to try Googling on the following

speech-to-text
speech recognition
speaker-indepent recognition

--
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

!