In 1867 Melville Bell, an authority in philology and phonetics, developed a phonetic alphabet which he called "Visual Speech". Through these handwritten symbols he succeeded in conveying to the eye information concerning the fine detail of speech sounds. The system brought Mr. Bell wide recognition, and it was used by him and later by his son, Alexander Graham Bell, in teaching the deaf to speak correctly.

AIexander Graham Bell was not satisfied with handwritten symbols; he wanted to produce visual speech by direct action of the voice. In the midst of his search for such a scheme he discovered the principles of telephony. He succeeded eminently in the translation of speech to electrical currents and back to speech, but he did not find a way to translate the electrical currents to visual patterns.

Numerous ways for making visual records of speech sounds have been devised in the past, but these records have not proved to be readable. Both the oscillograph trace and sound tracks on motion picture films are unreadable to the eye, although it is possible to obtain the original sound by means of a suitable reproducing device.

A spectrograph development project was started at the Bell Telephone Laboratories in l941. This device analyzed samples of speech in a visual record portraying frequency, intensity, and time relationships. Because of American participation in World War II major interest was centered upon military requirements regarding spectrograph applications. During this period acoustic scientists suggested that enemy radio voices might be identified by spectrogram in order to detect the movement of units from place to place. Less was known then about voices, and because present techniques were not then available, the type of identification then proposed was very tedious and ineffective.

Speech scientists have long been aware of the wide variation in voices. This wide variation is one of the factors that has thwarted many efforts to build a machine that will understand a large variety of spoken words when uttered by many different voices. The problem of recognition of speech by machine is a very formidable one. Primitive automatic recognizers have been built that can identify a small vocabulary of words – the digits, zero through nine, for example. If reliability improves it would be possible to "Dial" your telephone by speaking the desired number. Another possibility is a voice operated typewriter.

These areas of study have pointed up the truly remarkable uniqueness of an individual human voice.

Several years ago a law enforcement group inquired about what help they could get in combating telephoned bomb scares to airlines and public buildings. Their particular interest was in being able to identify the voice of the perpetrator of such crimes. The problem was referred to the Acoustic and Speech Research Dept. of the Bell Telephone Laboratories.

The author, one of four research experimental lists working in that area, was optimistic about the possibilities of voice identification on the basis of the unique naturalness factors which have so long confounded efforts to design machines that were capable of reliably recognizing human speech. He requested permission to study the area for two years and if encouraging results were not obtained at the end of six months, the study would be discontinued.

At the conclusion of this time period, the evidence obtained appeared encouraging and study was continued. The study was largely dependent on the use of the improved sound spectrograph which acted as an automatic wave analyzer recording the acoustic patterns of speech in the dimensions of time, frequency, and intensity. The acoustic patterns called voiceprints permitted side-by-side visual comparison of speech sounds, instead of requiring that an investigator listen to the sounds one after another with uncertain dependence on memory. The eye is more sensitive to minute changes in a complex pattern than is the ear to small changes in a complex sound. The ear is an averaging instrument and is sometimes tricked in discrimination trials.

The foundation of voiceprint identification is on the premise that every individual voice is uniquely characteristic enough to distinguish it from all others. The theory of the premise lies in the fundamental processes of human speech. There are two general factors involved. The first factor in determining voice uniqueness lies in the sizes of the vocal cavities.

The four major cavities effecting speech are the throat, nasal, and two oral cavities formed in the mouth by positioning the tongue. The vocal cavities are resonators, much like organ pipes, reinforcing some of the overtones produced by the vocal cords, and producing formants or voiceprint bars. These cavities have the property of selective transmission and radiation dependent upon sizes and manner in which they are coupled. The likelihood of two people having aII vocal cavities the same size and configuration and coupled identically appears rather remote.

The strongest factor in determining voice uniqueness lies in the manner in which the articulators or muscles of speech are manipulated during speech. The articulators include the lips, teeth, tongue, soft palate and jaw muscles, whose controlled dynamic interplay results in intelligible speech. lnfants acquire intelligible speech by a random learning process, literally a trial and error method of imitation of those around them who are successfully communicating. The likelihood that two people would develop the identical dynamic use patterns for their articulators seems remote, indeed.

Therefore, the chance that two speakers would have both identical vocal cavity dimensions and configurations coupled with identical articulator use patterns appears very remote.

The proof of individual voice uniqueness lies in acquiring a sufficiently large population sample to support this premise.

Thus far we have discussed a person’s normal speech. A second basic premise of voiceprint identification is that any intelligible speech is satisfactory for voiceprint identification purposes providing there are voice sounds or phonemes common to both known and unknown samples to be compared. There is limited empirical evidence to indicate that various conditions of disguised speech intended to confound a trained observer only made identification more difficult, but not impossible.

It might be expected that a file of voice prints would accumulate to a point where it would not be feasible to examine every file card against a sample unknown. A manual classification, such as used in fingerprinting to permit a more efficient search, is the logical answer. Speech is dynamic and the same sound produced by the same speaker on two different occasions is never quite identical. Further, portions of the voiceprint pattern may be altered by the context in which the cue word is used. Therefore, a classification system must accommodate these variations while taking into account the invariant factors to produce an indexing code.

A solution to this problem is an automated classification by computer system. A correlation coefficient matching method is used in selecting a limited, more workable population for visual identification. This method is still being experimentally explored and with improved computer arts make the voiceprint identification task more simple.

The nature of this technique is presently available to the investigative field and has been successfully applied. The voiceprint identification approach is empirical rather than theoretical, and the merit of the technique will be measured in time by practical application. This course is an orientation in the application, method and experience of voice print identification to date.

     Go to top of article     

Click here to go home               Wana know about us?
DUPLICATION
Audio & Video Cassette, DAT
Market Ready Packages

VIDEO DVD/AUDIO CDr's
Formats, Prices, Compatibility
Info,  Authoring, Copy

CONVERSIONS
NTSC  PAL  SECAM  HD
8mm, Slides, Beta, DV, S, Film

FORENSIC
Audio & Video Enhancing,
Noise Removal, Voice ID, Vita

NEW MUSIC
Jingles, Film Scores, Sweetening
NEW STUFF
Worlds' Smallest Digital Signage
ENGINEER DAN/DAVE
Talk to Us! We work for food.
STAGING & RENTALS
Lights, Design, Video Walls, PA
GRAPHICS/PRINT
Templates and Art Specs
Electronic Imaging
WEDDING VIDEO
"Pick an Expert", DVD, Samples
DJ, Photography, Prices

STUDIO SERVICES
Prices, OnLocation, Editing
AUDIO CLASS
Oakland U MUS353
NEWS/HOT STUFF
What's Happening, Links
MASTERING
Improve Your Home Mix
VIDEO/AUDIO TRUCK
5 Cam, Jib, 24 Track Audio
VOICES & TRANSLATION
Voices, Foreign, Dialects,
Announcers, Mouth, Listen n Pic
WEB, KIOSK DESIGNS
Art, Hosting, Interactive, Prices
PRICE DILEMMA
Program Production
ENGINEERING
D8B Fader Fix, AVTransmitter
Products For Sale
VOCAL ELIMINATE
Karioki Music Any Song
FEEDBACK

Free Stuff & Samples
BANDS GO HERE!
$$ For  Music, Our Engineers
OUR WEB CAM
Watch, Listen, Direct Projects

K&R's All Media Productions Inc.
28533 Greenfield, Southfield, MI 48076
(248) 557-8276 * Toll Free (888) 802-0420
Email:
recordav@knr.net