What is so complex about the speech production

Introduction

Speech processing is one of the largest growing research areas in signal processing. Each year billions of pounds are being spent on supporting research in speech processing.The ultimate aim of this research is to provide an interactive man-machine communication. Speech is a special form of communication medium; it conveys not only the meaning but it also expresses the emotion of the speaker and individual information about the speaker.

During the past few years, the vast number of research and development in speech processing brought up changes in our everyday life. There are commercially available products which are based on Automatic Speech Recognition, Speaker Verification, Speaker Identification and Speech Synthesizer. For a example the compag personal computer has a buil-in speech processor which executes restricted number of spoken voice commands. This advanced technology is based on the mechanism involed in human speech production and perception. In this article particular emphasis is given to speech production.

Speech

The human apparatus concerned with speech production and perception is complex and uses many important organs - The lungs, mouth, nose, ears controlling musceles and the brain. It is remarkable that this apparatus has developed to enable not only the speech production but also serves other purposes such as breathing or eating. It was discovered that various specific areas in the brain are regarded to be of prime importantce for speech and language. These are called the speech centers - damages to any of these areas causes disruption to speech.

The vocal tract and vocal cord play a major role in speech production. The vocal tract consists of several organs and muscles which are regularly monitored and carefuly controlled by the speech centers. The precise controlling is achvieved by internal feedback in the brain. As an example auditory feedback helps us to ensure that we are producing the correct speech sounds and that they are of the correct intensity for the environment. Speech sounds are produced when air is exhaled from the lungs and causes either vibration of vocal cord or turbulence at some point of contriction in the vocal tract. The shape of the vocal tract influences the sound harmonics. The way in which the vocal cord is vibrated and the shape of the vocal tract is varied in order to produce a range of speech sounds with which we are familiar .

Speech and Vocal Cord

The vocal cord is situated in larynx called the adams apple as shown in fig 1 .The vocal cord is the source for speech production in humans. It generates two kinds of speech sounds these are voiced and unvoiced. The vibration of vocal cords produces the sound called the voicing and the unvoiced sound due to turbulence of flow of air at a constriction at all possible sites in the vocal tract. The frequency of vibration of the cord is determined by several factors; the tension exerted by the muscle, it's mass and it's lentgh. These factors vary between sexes and according to age. The vibration of vocal cord produces harmonics - the amplitude of the harmonics decrease with increasing frequency.

Speech and Vocal Tract

The vocal tract is divided into two parts, first one is called the oral tract which is highly mobile and consists of the tongue, pharynx, plate, lips, and jaw etc..( shown in fig 1 ). The position of these organs are varied to produce different speech sounds, which we hear as the radiation from the lips or nostrils. The second one is the nasal tract where is immobile but is coupled with oral tract by changing the position of the velum. The shape of the vocal tract responds better for some basic frequency produced by vocal cord than others, this is the essential mechanism for the production of different speech sounds. The lowest resonance frequency for a paticular shape of the vocal tract is called the first formanant ( f1 ) and next the second formanant frequency ( f2 ) and so on.

Language and Speech

The purpose of speaking is to convey meaningful ideas to the listener. In order to do this, the listener should be able to interpret the meaning of the spoken sounds. One way of doing this is by providing a coding mechanism with set of rules enabling the listener to interpret the meaning of the speech. The human being uses linguistics as the tool for coding the information. The coding mechanism is not starightforward. The new ideas are converted into linguist structure. This requires selection of appropriate words, pharses. These words are ordered in sequence according to grammatical rules.

Sounds and speech

From the linguistic point of view the smallest speech unit is known as phonemes, which indicates a different in meaning and is normally written between slashes as for example /m/ in hum. In fact the sounds produced for individual phonemes vary depending on where it appears in a word, phonemes sets are different for different languages, as for example about 40 phonemes are sufficient to discriminate between all the sounds made in British english.

Phonemes are characterised in to six different groups. These are the vowels ,dipthongs,semi vowels, stop constant, fricative and affricative. The grouping of these phonemes is based on the way these sounds are produced. Each phonemes is a combined version of the first three dominant formanat frequency which is originated due to vibration of the vocal cord. However the formanat frequency largely vary depending on the speaker.

Lack of Research

The Scientist and Engineers have understood the basic concepts behind the anatomy and physiology of speech production and perception. But the lack of understanding of the interaction of the brain with vocal tract and auditory apparatus prevents Engineers from designing machines, which will be able to understand and speak like ordinary human beings.

References

  • Thmoas W.Parsons : Voice and Speech Processing . McGraw-Hill,Inc.1986.
  • W.J.Hardcastle : Physiology of Speech production. Academic press Inc( London ).1976.
  • Sadaoki Furui : Digital Speech Processing, Synthesis and Recognition. Marcel Dekker,Inc.1989.