In 1867 Melville Bell, an authority in philology and phonetics, developed a phonetic alphabet which he called "Visual Speech". Through these handwritten symbols he succeeded in conveying to the eye information concerning the fine detail of speech sounds. The system brought Mr. Bell wide recognition, and it was used by him and later by his son, Alexander Graham Bell, in teaching the deaf to speak correctly.
AIexander Graham Bell was not satisfied with handwritten symbols; he wanted to produce visual speech by direct action of the voice. In the midst of his search for such a scheme he discovered the principles of telephony. He succeeded eminently in the translation of speech to electrical currents and back to speech, but he did not find a way to translate the electrical currents to visual patterns.
Numerous ways for making visual records of speech sounds have been devised in the past, but these records have not proved to be readable. Both the oscillograph trace and sound tracks on motion picture films are unreadable to the eye, although it is possible to obtain the original sound by means of a suitable reproducing device.
A spectrograph development project was started at the Bell Telephone Laboratories in l941. This device analyzed samples of speech in a visual record portraying frequency, intensity, and time relationships. Because of American participation in World War II major interest was centered upon military requirements regarding spectrograph applications. During this period acoustic scientists suggested that enemy radio voices might be identified by spectrogram in order to detect the movement of units from place to place. Less was known then about voices, and because present techniques were not then available, the type of identification then proposed was very tedious and ineffective.
Speech scientists have long been aware of the wide variation in voices. This wide variation is one of the factors that has thwarted many efforts to build a machine that will understand a large variety of spoken words when uttered by many different voices. The problem of recognition of speech by machine is a very formidable one. Primitive automatic recognizers have been built that can identify a small vocabulary of words the digits, zero through nine, for example. If reliability improves it would be possible to "Dial" your telephone by speaking the desired number. Another possibility is a voice operated typewriter.
These areas of study have pointed up the truly remarkable uniqueness of an individual human voice.
Several years ago a law enforcement group inquired about what help they could get in combating telephoned bomb scares to airlines and public buildings. Their particular interest was in being able to identify the voice of the perpetrator of such crimes. The problem was referred to the Acoustic and Speech Research Dept. of the Bell Telephone Laboratories.
The author, one of four research experimental lists working in that area, was optimistic about the possibilities of voice identification on the basis of the unique naturalness factors which have so long confounded efforts to design machines that were capable of reliably recognizing human speech. He requested permission to study the area for two years and if encouraging results were not obtained at the end of six months, the study would be discontinued.
At the conclusion of this time period, the evidence obtained appeared encouraging and study was continued. The study was largely dependent on the use of the improved sound spectrograph which acted as an automatic wave analyzer recording the acoustic patterns of speech in the dimensions of time, frequency, and intensity. The acoustic patterns called voiceprints permitted side-by-side visual comparison of speech sounds, instead of requiring that an investigator listen to the sounds one after another with uncertain dependence on memory. The eye is more sensitive to minute changes in a complex pattern than is the ear to small changes in a complex sound. The ear is an averaging instrument and is sometimes tricked in discrimination trials.
The foundation of voiceprint identification is on the premise that every individual voice is uniquely characteristic enough to distinguish it from all others. The theory of the premise lies in the fundamental processes of human speech. There are two general factors involved. The first factor in determining voice uniqueness lies in the sizes of the vocal cavities.
The four major cavities effecting speech are the throat, nasal, and two oral cavities formed in the mouth by positioning the tongue. The vocal cavities are resonators, much like organ pipes, reinforcing some of the overtones produced by the vocal cords, and producing formants or voiceprint bars. These cavities have the property of selective transmission and radiation dependent upon sizes and manner in which they are coupled. The likelihood of two people having aII vocal cavities the same size and configuration and coupled identically appears rather remote.
The strongest factor in determining voice uniqueness lies in the manner in which the articulators or muscles of speech are manipulated during speech. The articulators include the lips, teeth, tongue, soft palate and jaw muscles, whose controlled dynamic interplay results in intelligible speech. lnfants acquire intelligible speech by a random learning process, literally a trial and error method of imitation of those around them who are successfully communicating. The likelihood that two people would develop the identical dynamic use patterns for their articulators seems remote, indeed.
Therefore, the chance that two speakers would have both identical vocal cavity dimensions and configurations coupled with identical articulator use patterns appears very remote.
The proof of individual voice uniqueness lies in acquiring a sufficiently large population sample to support this premise.
Thus far we have discussed a persons normal speech. A second basic premise of voiceprint identification is that any intelligible speech is satisfactory for voiceprint identification purposes providing there are voice sounds or phonemes common to both known and unknown samples to be compared. There is limited empirical evidence to indicate that various conditions of disguised speech intended to confound a trained observer only made identification more difficult, but not impossible.
It might be expected that a file of voice prints would accumulate to a point where it would not be feasible to examine every file card against a sample unknown. A manual classification, such as used in fingerprinting to permit a more efficient search, is the logical answer. Speech is dynamic and the same sound produced by the same speaker on two different occasions is never quite identical. Further, portions of the voiceprint pattern may be altered by the context in which the cue word is used. Therefore, a classification system must accommodate these variations while taking into account the invariant factors to produce an indexing code.
A solution to this problem is an automated classification by computer system. A correlation coefficient matching method is used in selecting a limited, more workable population for visual identification. This method is still being experimentally explored and with improved computer arts make the voiceprint identification task more simple.
The nature of this technique is presently available to the investigative field and has been successfully applied. The voiceprint identification approach is empirical rather than theoretical, and the merit of the technique will be measured in time by practical application. This course is an orientation in the application, method and experience of voice print identification to date.
|
|
||
| DUPLICATION Audio & Video Cassette, DAT Market Ready Packages VIDEO DVD/AUDIO CDr's Formats, Prices, Compatibility Info, Authoring, Copy CONVERSIONS NTSC PAL SECAM HD 8mm, Slides, Beta, DV, S, Film FORENSIC Audio & Video Enhancing, Noise Removal, Voice ID, Vita NEW MUSIC Jingles, Film Scores, Sweetening NEW STUFF Worlds' Smallest Digital Signage ENGINEER DAN/DAVE Talk to Us! We work for food. |
STAGING
& RENTALS Lights, Design, Video Walls, PA GRAPHICS/PRINT Templates and Art Specs Electronic Imaging WEDDING VIDEO "Pick an Expert", DVD, Samples DJ, Photography, Prices STUDIO SERVICES Prices, OnLocation, Editing AUDIO CLASS Oakland U MUS353 NEWS/HOT STUFF What's Happening, Links MASTERING Improve Your Home Mix VIDEO/AUDIO TRUCK 5 Cam, Jib, 24 Track Audio |
VOICES
& TRANSLATION Voices, Foreign, Dialects, Announcers, Mouth, Listen n Pic WEB, KIOSK DESIGNS Art, Hosting, Interactive, Prices PRICE DILEMMA Program Production ENGINEERING D8B Fader Fix, AVTransmitter Products For Sale VOCAL ELIMINATE Karioki Music Any Song FEEDBACK Free Stuff & Samples BANDS GO HERE! $$ For Music, Our Engineers OUR WEB CAM Watch, Listen, Direct Projects |
K&R's All Media Productions Inc.
28533 Greenfield, Southfield, MI 48076
(248) 557-8276 * Toll Free (888) 802-0420
Email: recordav@knr.net