Preview only show first 10 pages with watermark. For full document please download

Forensic Phonetics

Forensic Phonetics: Issues in speaker identification evidence Andrew Butcher Centre for Human Communication Research Flinders Medical Research Institute Flinders University, Adelaide, Australia Abstract The field of forensic phonetics has developed over the last 20 years or so and embraces a number of areas involving analysis of the recorded human voice. The area in which expert opinion is most frequently sought is that of speaker identification – the question of whether two or more recordings

   EMBED


Share

Transcript

  Forensic Phonetics:  Issues in speaker identification evidence Andrew Butcher  Centre for Human Communication Research Flinders Medical Research Institute Flinders University, Adelaide, Australia Abstract The field of forensic phonetics has developed over the last 20 years or so and embraces a number of areas involving analysis of the recorded human voice. The area in which expert opinion ismost frequently sought is that of speaker identification – the question of whether two or morerecordings of speech (from suspect and perpetrator) are from the same speaker. Automatedanalysis (in which Australia is a world leader) is only possible where recording conditions areidentical. In the most frequently encountered real-world forensic situation, comparison isrequired between a police interview recording and recordings made via telephone intercepts or listening devices. This necessitates a complex procedure, involving auditory and acousticcomparison of both linguistic and non-linguistic features of the speech samples in order to buildup a profile of the speaker. The most commonly used measures are average fundamentalfrequency and the first and second formant frequencies of vowels. Much work is still needed todevelop appropriate statistical procedures for the evaluation of phonetic evidence. This meansestimating the probability of finding the observed differences between samples from the samespeaker and the probability of finding those same differences between samples from twodifferent speakers. Thus there needs to be an acceptance that the outcome will not be an absoluteidentification or exclusion of the suspect. By itself, your voice is not a complete giveaway. 1. The field of forensic phonetics The use of phonetics as a forensic tool has developed over the past 20 years or so (Hollien 1990;Baldwin & French 1991), but with the rapid expansion in the number of cases depending on theevidence of covert audio and video recordings in recent years, forensic phonetics now plays acrucial role in an increasing number of criminal trials. A forensic phonetician may be asked to prepare reports in a number of areas, of which the following four are the most frequentlyencountered:1.1 Speaker identification. This is by far the most commonly required task and the subject of theremainder of this paper.1.2 Disputed utterances. In view of the usually very poor quality of covert police recordings(especially those made via a listening device), there is often ample scope for a defendant to   Forensic Phonetics Butcher  2challenge the prosecution’s version of what was actually said in the course of a recordedconversation. Forensic phoneticians may be asked to prepare a report on the quality of therecording and the intelligibility of the speech. They may also be asked to prepare an ‘objective’transcript of the recording.1.3 Tape authentication. Occasionally a defendant (or a civil litigant) may have cause to questionwhether an audio recording has been tampered with in some way. Usually the claim is thatcertain sections have been excised or perhaps transposed. It is not generally within thecompetence of a phonetician to give an opinion as to the physical condition of a tape, but theremay be evidence within the acoustic signal (‘pops’ or abrupt changes in either the signal itself or the background noise) which would be indicative of electronic editing. However, currentlyavailable software makes ‘seamless’ editing comparatively easy, and a phonetician may beneeded to give an opinion on the only remaining evidence of any tampering – linguistic evidencein the form of unnatural changes in rhythm, tempo or intonation.1.4 Voice line-ups. The practice of confronting witnesses of a crime with a tape recorded ‘voiceline-up’, where the voice of a suspect is included amongst a series of ‘foils’, may be used toobtain evidence of identification in cases where, in the course of committing a crime, an unseenor masked perpetrator has spoken in the presence of the witnesses.   This recording is played tothe witness(es) and they are asked to state whether they can identify any of the voices as that of the perpetrator. In order to be entirely fair to the suspect, there are a number of criteria whichneed to be observed   (Broeders & Rietveld 1995;   Hollien, Huntley, Künzel & Hollien P 1995).   Aswith visual identification parades, it is a general principle of fairness in the conducting of voiceline-ups is that there should be no feature of any of the voices or the recordings which wouldcause non-witnesses to pick out a particular speaker (whether suspect or foil) as being differentfrom the rest. A phonetician may be consulted on aspects of the construction of the tape and theadministration of the confrontation. 2. Speaker Identification: analysis and measurement I would estimate that at least 90% of my work as a forensic phonetician is concerned with theidentity of speakers in audio recordings. There is a good deal of misunderstanding surroundingthe capabilities of speech technology in this area. Some of this misunderstanding dates from the1960’s, when the “Voiceprint” technique became a favourite tool of certain police forces, most   Forensic Phonetics Butcher  3notably in the USA. This methodology, which involved the visual inspection and impressionisticcomparison of sound spectrograms, was regarded sceptically by the scientific community at thetime, and has since been entirely discredited (Hollien 1990, 2002; Gruba & Poza 1995). Theterm “Voiceprint” suggests that the technique is analogous to forensic techniques such asfingerprinting or    DNA analysis. There are a number of reasons why this is an inappropriateanalogy. Firstly, there is no single feature of the voice which is unique to every speaker. Unlikethe vanishingly small possibility in the case of fingerprints or DNA molecules, it is quite possible for two speakers to be, for all practical purposes, identical in some respect. Secondly,most (if not all) of the features of the voice which are measurable in recordings of the qualitytypically encountered in the forensic context are capable of being consciously changed by thespeaker. These include, voice pitch, aspects of voice quality, consonantal articulation, and vowelquality. At present it is not impossible for a skilled mimic to defeat the forensic voiceidentification procedure. Thirdly, for most of the voice features, we do not have sufficient dataon the normal population to know what the chances are of two speakers being similar or identicalwith respect to that feature. Finally, acoustic parameters vary as a consequence of differences inrecording conditions as well as of differences in the voice itself. Australia leads the world in thetechnology of automatic speaker recognition (in 2001 a team from the RCSAVT SpeechResearch Lab at Queensland University of Technology won two of the categories for singlespeaker detection tasks in the National Institute of Standards & Technology’s benchmark testson speaker recognition), but automatic speaker recognition is not yet able to separate outvariation due to speaker differences from variation due to recording conditions (and it is doubtfulwhether it will ever be able to). Thus automatic speaker recognition techniques are of limited usein the typical forensic situation, where a voice recorded over the telephone or via a listeningdevice is to be compared with a voice recorded in a police interview room. The intervention of a phonetically and linguistically qualified human operator is required. The main components of the procedure are an auditory analysis and an acoustic analysis, each of which in turn has a number of component parts. Voice ID is therefore more appropriately compared with a technique such asa ‘photo-fit’ type of procedure, where a number of features are considered as part of an overall profile. 2.1. Auditory analysis   Forensic Phonetics Butcher  4This part of the analysis involves careful and repeated listening by the expert, noting features of the voices in question under four basic headings. Firstly, voice quality features are ascertained.This means describing ‘voice’ in the technical sense – i.e. the sound made by the vibration of thevocal folds – and ignoring for the moment any variations contributed by the resonances of thethroat, mouth and nasal passages above. It can be done using one of a number of descriptiveframeworks (e.g. Isshiki & Takeuchi 1970; Laver 1980; Wendler, Rauhut & Krüger 1986; Oates& Russell 1998), whereby aspects of the voice can be quantified according to parameters such as‘roughness’, ‘strain’, ‘creakiness’, ‘breathiness’ and so on – terms which are meaningful to other  phoneticians and speech scientists and which describe in as accurate and objective way as possible the auditory impressions of the listener. Secondly, the investigator attends to the non-linguistic characteristics of the speech which are not produced by the larynx. This meanslistening to the effects of the long-term setting of the throat, the tongue and lips and theresonances of the nasal passages and sinuses. This is known as the articulatory setting, and heretoo, established descriptive frameworks are available (Laver 1980;   Esling 1994) which rate thevoice according to such parameters as ‘hypernasality’, ‘pharyngealisation’, ‘labialisation’, aswell as vertical position of the larynx. The third set of parameters relate to aspects of (mainlyvowel) articulation which provide clues to the speaker’s geographical and social background. Inlong-established linguistic communities such as in the United Kingdom and Europe, this part of the analysis can provide very useful information. In a recently established community such as(non-Absrcinal) Australia, the information which can be gleaned is usually quite scanty.Australian English accents are traditionally classified on a three-point scale as being ‘Broad’,‘General’ or ‘Cultivated’   (Mitchell & Delbridge 1965), but there are very few features whichenable us to pinpoint the speaker’s geographical srcins with any accuracy. One or two pronunciations are peculiar to Queensland and another one or two distinguish speakers with aSouth Australian background. A more recent phenomenon is the “pan-ethnic” accent (sometimesknown as “wogspeak”) which has developed among second- and subsequent-generationAustralians of non-English-speaking background (Warren 1999). The final component of theauditory analysis is the identification of any idiosyncratic pronunciation features which may be present. The more commonly occurring idiosyncrasies involve the articulation of consonants,and include various types of ‘lisp’, the labialising of ‘r’ (‘ rabbit  ’ becomes something likes