
FILE:  $HMMS/app/test/README

	NOTICE: This is copyrighted software.  You should read the
COPYRIGHT file in the root directory of this distribution for
conditions of use.


        This distribution contains a portion of a software system for
training phonetic Hidden Markov models using the Segmental K-means
algorithm.  This distribution only includes that portion of the system
written in the functional language Haskell.  This file explains many
of the steps involved in doing an HMM training run.  However, we have
provided enough test files to test only some of the programs.  If you
want to test all of the programs, you'll need to get your hands on a
large speech database.


	Throughout these notes, we'll assume that the environmental
variable HMMS contains the root directory for this distribution.


	By the way, all of the commands are listed in a shell script
called ``demo,'' so you don't have to type the commands in to get
started.  If you type ``demo'', you should see something like this on
your screen:


        Running ConvertLinearDic

        Output written to dictionary.0.dgs

        Running BatchTranscribe

        data/dmg_01.txt         -> data/dmg_01.ppm
        data/dmg_02.txt         -> data/dmg_02.ppm
        data/dmg_03.txt         -> data/dmg_03.ppm
        data/dmg_04.txt         -> data/dmg_04.ppm
        data/dmg_05.txt         -> data/dmg_05.ppm
        data/dmg_06.txt         -> data/dmg_06.ppm

        Running BatchAlign

           1  11.08  11.08  data/dmg_01
           2  11.68  11.38  data/dmg_02
           3   9.98  10.92  data/dmg_03
           4  11.02  10.94  data/dmg_04
           5   9.86  10.72  data/dmg_05
           6  10.64  10.71  data/dmg_06
        70.1u 1.0s 1:15 94% 0+8996k 158+6io 158pf+0w

        Finished running the demo script


~~~~~~~~~~~~~~~~~~~~~
Detailed Instructions
~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~
        1
~~~~~~~~~~

        This directory originally contains three files and one
subdirectory.  The three files are

        sentences               a list of the root names of the
                                sentence data files, with paths
                                relative to this directory
        
        vocabulary              a list of the words in the sentence
                                texts
        
        dictionary.0            the words plus their linear
                                pronunciation models

The subdirectory ``data'' originally contains three files for each of
six test utterances.  A file with no extension is the original sampled
speech waveform file (16000 samples/sec, 14 bit A/D, stored as short
ints, i.e., 2 bytes/sample).  A file with the same root name and a
``.txt'' extension contains the text of what was said; a file with the
same root name and a ``.fea'' extension contains the feature vectors
as computed by a signal processing program.  (The feature vectors have
18 coordinates, and there are 100 of them for each second of
speech. They were generated using a C program not included in this
distribution.)


~~~~~~~~~~
        2
~~~~~~~~~~

        The first step is to convert the linear pronunciation
dictionary into a dictionary with the entries in a directed-graph
representation.  This is done using the program ``ConvertLinearDic''.

        % $HMMS/bin/ConvertLinearDic dictionary.0

This creates the file dictionary.0.dgs.  This file could be edited
further if necessary, e.g., to change linear pronunciation models to
models with alternate pronunciation paths (which was the motivation
for using a directed graph representation).

	This step only needs to be repeated if there is a change in
the linear dictionary.


~~~~~~~~~~
        3
~~~~~~~~~~


	Next we must transcribe each sentence.  The names of each text
file must be in a master file.  Suppose that this file's name is
"sentences".  Here is an example of how the first few lines of
this file might look:

	data/dmg_01
	data/dmg_02
	data/dmg_03
             :
             :

These identify the .txt files, except for the .txt extension.  It says
they lie in a subdirectory ``data'' of the current directory.

	The program BatchTranscribe is used to transcribe these files,
via the command

        % $HMMS/bin/BatchTranscribe dictionary.0.dgs sentences

This program produces a .ppm file for each .txt file listed in the
training sentences file, in the same directory as the .ppm file.

	This step only needs to be repeated if there is a change in
the pronunciation-network dictionary.


~~~~~~~~~~
	4
~~~~~~~~~~

	You're now ready to align each file using the Viterbi
algorithm using the program ``BatchAlign''. This is the slowest
program in the suite.

	% $HMMS/bin/BatchAlign $HMMS/hmms/h9 $HMMS/hmms/h9.ties \
		$HMMS/hmms/h9.dgs sentences

	When I run this program on the example data files, I get the
following output:

   1  11.08  11.08  data/dmg_01
   2  11.68  11.38  data/dmg_02
   3   9.98  10.92  data/dmg_03
   4  11.02  10.94  data/dmg_04
   5   9.86  10.72  data/dmg_05
   6  10.64  10.71  data/dmg_06

        The alignment files produced by the alignment program have the
extension .algn

	When I run this program on an unloaded SPARC 10, I get
timing statistics like the following (hbc 0.999.4):

	70.0u 1.2s 1:15 94% 0+9036k 12+21io 120pf+0w

	You can change the timing performance somewhat by changing the
amount of heap the program has available.

	NOTE: if you use ghc, you need to allocate more heap than the
default amount; 24M is enough.  Also, you will have to modify the
makefiles and edit two of the program files in src/haskell; see the
makefiles in lib/haskell and src/haskell for details.


~~~~~~~~~~
        5
~~~~~~~~~~

        The other programs won't be useful until you have a large
speech database.  However, the program BatchAlign is computationally
the most expensive, so should be the most interesting to Haskell
compiler developers.

	Of course, you can test the program CountAlignmentDiffs by
just copying the .algn files into .algn.old files and then running the
program.

	% $HMMS/bin/CountAlignmentDiffs sentences

	If you keep these old alignment files, you can work on the
code and periodically make sure that you haven't changed the answers
produced by the program.

