Speech for Java v0.70

Overview

Speech for Java is a Java programming interface for speech that gives Java application developers access to the IBM ViaVoice speech technology. Speech for Java supports voice command recognition, dictation, and text-to-speech synthesis, based on the IBM ViaVoice technology.

Speech for Java is an alpha implementation of a core subset of the beta Java Speech API. The Java Speech API is a cross-platform Speech API that was developed by Sun Microsystems Inc. in collaboration with IBM and other industry speech technology companies. More information on the Java Speech API can be found at the Java Speech API home page.

Changes from version 0.6

There have been some relatively minor changes to the API. The only change needed for the Hello program is that the RuleResult.getFinalRuleResult method has been removed; the FinalRuleResult object is now obtained from the RuleResult object by casting.

In addition, a tentative method for persistently registering vendor engines (such as the IBM ViaVoice synthesis and recognition engine) with the system has been defined. In version 0.6, because there was no such mechanism, we instead adopted the temporary expedient of hard-coding the IBM engine into the system classes. Because of this change in version 0.7, it is now necessary to run an install script, found in install.bat, before running any Java Speech applications, in order to register the IBM engines with the system.

Requirements

In much the same way that Java implementations on Windows are built on top of the native Windows GUI capabilities, Speech for Java is built on top of the native speech recognition and synthesis capabilities in IBM ViaVoice. Thus Speech for Java requires installation of IBM ViaVoice Gold on your computer. ViaVoice is not provided as part of this package. You may find more information about ViaVoice at the VoiceType / ViaVoice Home Page.

Your computer should meet the minimum requirements for running IBM ViaVoice (166MHz Pentium or 150MHz Pentium with MMX, running Windows 95 with 32MB of memory or Windows NT with 48MB). The Speech for Java implementation will take advantage of any enrollment, dictation macros, and added words in your ViaVoice installation.

We have only tested Speech for Java on the JavaSoft JDK 1.1.5 version of Java. We recommed that you use this version if possible.

Installation

After unpacking the installation package to its own directory you should have the following files:

README.html   This file
install.bat   Installation script
lib\ibmjs.jar   The Java portion of the Speech for Java implementation
lib\ibm*.dll   The native portion of the Speech for Java implementation
hello\   The Hello sample application
ref\   Java Speech API reference documentation. See index.

Suppose you have unpacked the installation package to c:\ibmjs. You should

Alternatively you may set these environment variables just before running the Java virtual machine, as illustrated by the hello.bat file in the hello directory. Also, if you neglect to run install.bat, hello.bat will also register the IBM engines with the system.

Using the Hello sample

Note: the following description is written assuming that your language is English. The Hello program also now supports French and German. Look at the corresponding hello*.gram and res*.properties files to find the equivalent instructions for other languages.

Note: the Hello program requires a full-duplex sound card and driver. Some older sound cards and older drivers don't support simultaneous audio output (synthesis) and audio input (recognition). For some notes on working around this problem, please see the Community eXchange section of the alphaWorks site.

To test your installation, open a command prompt, cd to the hello directory, and run the hello batch file. This will set the PATH and CLASSPATH variables, and start the Hello program. The program will start out by saying "Hello human, my name is Computer, what is your name?" At this point the program is in command mode, and you may say one of the following:

You say   Program does
"My name is first last"   Says "Hello first last"
"Goodbye"   Says "Bye now" and exits
"Repeat after me"   Enters dictation mode
"That's all"   Leaves dictation mode and repeats what you said

Note that since this is just a simple example, the Hello application has a very limited range of first and last names that it understands. You can look at the appropriate hello*.gram file to see which names it knows.

You may add your own name by modifying hello.gram and then restarting the Hello program. To add your own name, follow the pattern that you see in hello.gram. For example, if your name is John Doe, try adding a line that says "| John {John}" to the rule for <first>, and similarly for <last>. You should now be able to say "My name is John Doe" and get an appropriate response.

The part in curly brackets in hello.gram is a tag. Tags are general-purpose mechanism of the Java Speech Grammar Format (JSGF); in this sample it is used to determine how the computer says your name. Try modifying the tag for your name and restarting the hello program, and see what the computer says when you say "My name is ...".

While in command mode, you can say "Repeat after me", at which point the computer will enter dictation mode and say "I'm listening". You may now dictate any text you like and finish by saying "That's all", at which point the program will repeat what you said and go back into command mode.

For more information

The source of the Hello program is included to help you understand the basics of writing a speech application using the Speech for Java. The following reference documentation is also provided:

The Programmer's Guide is probably the best starting point. These documents represent the state of the Java Speech API specification at the time this version of Speech for Java was released. For more information and news about the Java Speech API you may refer to the Java Speech API Home Page.

Notes on the Speech for Java implementation

Speech for Java is an alpha implementation of a core subset of the beta Java Speech API. A number of methods and interfaces are not yet implemented, including the following:

DictationGrammar.removeWord
DictationGrammar.listAddedWords
DictationGrammar.listRemovedWords
DictationGrammar.setContext
DictationResult.correctResult
DictationResult.isCorrectionInfoAvailable
DictationResult.releaseCorrectionInfo
Grammar.getFinalizePauseForCommit
Grammar.setFinalizePauseForCommit
Recognizer.getAudioManager
Recognizer.readVendorGrammar
Recognizer.writeVendorGrammar
Recognizer.getEngineAttributes
Recognizer.readVendorResult
Recognizer.writeVendorResult
Recognizer.forceFinalize
Result.isAudioAvailable
Result.releaseAudio
ResultToken.getStartTime
ResultToken.getEndTime
RuleGrammar.ruleForJSGF
RuleGrammar.removeImport
RuleGrammar.listImports
Synthesizer.deallocate
Synthesizer.reset
Synthesizer.cancel
Synthesizer.resume
Synthesizer.pause
Synthesizer.isPaused
Synthesizer.addEngineListener
Synthesizer.removeEngineListener
Synthesizer.getAudioManager
Synthesizer.getVocabManager
Synthesizer.getEngineAttributes