In the last 45 seconds, there is a faint voice, a distant yell, and the urgent dialogue between a woman and a 911 operator.
“There’s just someone screaming outside,” the caller begins on the recorded line.
There is more distant yelling obscured by the operator — “Male or female?” — and the caller — “I think they’re yelling ‘help,’ but I don’t know.” There is a high-pitched scream, a kind of cry, and then the clearest sound of all.
“There’s gunshots. . . . Just one,” the woman says on the only 911 call to record what was happening in the dark at the Retreat at Twin Lakes, a gated townhouse community in Sanford, Fla., on the night of Feb. 26.
Those recorded 45 seconds turned out to be a recording of the end of Trayvon Martin’s life.
And amid the conflicting, hazy and at times emotional reports from neighbors who heard and glimpsed only fragments of what was happening during those crucial seconds, the audio recording of them — from the start of the call at 7:16:11 p.m. until the gunshot at 7:16:56 p.m. — is perhaps the closest prosecutors and defense attorneys may come to an objective witness to the events that night.
It remains unclear exactly how the recording might be used in the court case, now underway, in which neighborhood watch volunteer George Zimmerman, who said he shot the unarmed 17-year old in self-defense, is charged with second-degree murder.
Zimmerman defense attorney Mark O’Mara said Friday on CBS that the recording would require “a lot of forensic work-up.” And last week, Florida special prosecutor Angela B. Corey released a trove of documents that included an FBI analysis stating that the recording is inconclusive and a witness list that includes two audio experts who have said the opposite.
Two weeks before charging Zimmerman, who has pleaded not guilty, Corey hinted that the recording could be crucial.
“The exact words and whose voice is whose will be the critical issues,” she said in an interview with The Washington Post.
Legal experts say the recording could be enormously important or disastrous for either side, depending on what a jury determines it can hear.
But what happens when a potentially crucial piece of evidence in one of the most explosive court cases in recent memory is a poor-quality recording of overlapping voices and unintelligible yells, essentially a wilderness of sound?
If you can’t hear the 45 seconds, how do you hear the 45 seconds?
The answer may come down to which expert you ask.
One of those experts is Alan R. Reich, and his answer is that he is certain he can hear a young man he concludes is Martin pleading for his life, from the start of the 45-second recording until the end.
“I’m begging you,” he hears the younger of the two men yell as the recording begins.
Twenty-six seconds later: “Help me.”
In the last second before the gunshot: a high-pitched “Stop!”
In an effort to find out what might be discerned from the crucial 911 call, The Washington Post retained Reich, 67, a former University of Washington professor with a doctorate in speech science who has worked for prosecutors and defense attorneys in hundreds of criminal and civil cases over a period of more than 35 years.
Where many people have heard only vague yells on the recording, Reich said that he has found language. Reich also identified two distinct male voices outside, in the background of the recording — one younger, one older — that he concludes are those of Martin, 17, and Zimmerman, 28.
To familiarize himself with Zimmerman’s voice, Reich also listened many times to a recorded call that Zimmerman placed to police minutes earlier that night and that has established much of what is known about the moments leading up to those last 45 seconds:
At 7:09:34 p.m., Zimmerman was driving out on an errand, armed with a 9mm Kel-Tec semiautomatic pistol, when he called Sanford police to report “a real suspicious guy.” That person was Martin, who was walking back to the townhouse where he was staying with his father and his father’s girlfriend inside the gated community.
Cursing under his breath, Zimmerman got out of his truck and began to follow him. The dispatcher told him to stop, and at 7:13:38 p.m. the call ended.
From that point until the gunshot at 7:16:56, there are different versions of what happened.
Prosecutors have said that Zimmerman ignored the dispatcher and confronted Martin, and that a struggle ensued.
A friend of Martin’s who was on the phone with him at the time, said he told her that a man who looked “crazy and creepy” was following him, according to the friend’s interview with a prosecutor, released Thursday. In the interview, the friend said she heard the man say, “What are you doing around here?” And then, she said, just before the call cut off, she heard Martin say, “Get off. Get off.”
Zimmerman’s family has said that he was walking back to his truck when Martin attacked him, punching his nose, knocking him down and beating his head into the sidewalk until he managed to fire his gun.
Injuries to Zimmerman’s nose and head are detailed in medical records.
However it happened, what is certain is that soon after Zimmerman hung up with the dispatcher, he and Martin came face to face, between two long rows of peach-colored townhouses with windows facing the scene, and neighbors began to hear yelling. It was dark and raining lightly.
And at 7:16:11, a woman’s 911 call began recording the sounds.
Several weeks later, at The Post’s request, working with a copy of the call downloaded from the Sanford city Web site, Reich began listening to the screams.
Using Sony Sound Forge Pro and KayPentax Multi-Speech software, he identified certain sound segments he wanted to examine more closely, such as the distant yell in the first second of the recording just as the 911 operator starts to speak.
He generated visual images of those segments, both sound wave forms and ones called spectrographs — a widely accepted tool that breaks down complex sounds into their component frequencies in much the way the ear and the brain does — so that the distant yell became bands of color indicating characteristics of speech.
It was there in the waves and colors that Reich began to see patterns indicating vowels, words and bits of dialogue. He amplified the segments. He listened again and studied the graphs.
The distant yell in the first second of the recording, Reich concluded, was actually a four-syllable phrase: “I’m begging you.” The yell in the very last second before the gunshot was a word that the spectrograph indicated began with an “st” sound, followed by an “ah” sound: “stop.”
Reich measured a particular frequency of the “ah” sound, which he said corresponds to certain anatomical factors in the speaker, such as the length and diameter of the vocal tract, as well as speaking style. This frequency, he said, was “highly appropriate for a 17-year-old male” who was still growing.
“The word was produced by the younger of the two male speakers, Trayvon Martin,” Reich said.
He listened to the recording over and over again.
Throughout the 45 seconds — with the exception of eight seconds of silence where police redacted the caller giving her address — the voice Reich believes to be Martin’s was “extraordinarily stressed, frightened and desperate,” he concluded.
As he continued to listen, Reich discerned a second voice in the background, one that was much more difficult to tease out. He amplified those segments, analyzed them and compared the patterns with Zimmerman’s vocal patterns on his earlier call.
Reich concluded this voice was the older of the two speakers, Zimmerman.
Reich is almost certain he hears this voice shout “What the f---” at 7:16:48. At different points in the recording, Reich said, the voice he believes is Zimmerman’s comes in short, assertive bursts of language in which the words are not clear but the tone and rhythm are.
The voice he concluded was Zimmerman’s is a “firmer, more consciously controlled voice,” he said.
What also struck Reich as he played and replayed the recording was what he did not hear: no sound of the older voice screaming, no obvious sounds of a physical struggle.
“Acoustical evidence of slapping, punching, shoving, wrestling, falling, throwing objects, was noticeably absent,” Reich said.
The analysis does not discount the possibility that there was a physical struggle between Martin, who had an abrasion on his left ring finger, and Zimmerman, who had a one-inch laceration in the back of his head, an abrasion on his forehead and a bloody, fractured nose, according to newly released photos and police and medical reports.
Rather, Reich’s analysis suggests that whatever physical struggle occurred was over by the time the recorded 911 call began.
From that point until the gunshot 45 seconds later, Reich said, it is Zimmerman who seems to have the upper hand, not Martin.
“It is Trayvon who felt threatened,” Reich said. “The help cries are all Trayvon.”
Another way to consider the 45-second recording is the way James J. Ryan considers it.
Ryan is the retired head of the FBI forensic audio, video and image analysis unit. He said even the best audio forensic expert in the world using the most sophisticated equipment available would have a difficult time determining much at all from a recording of such degraded quality.
“I think it’s hard to scientifically say anything definitive with audio like this,” Ryan said. “. . . One person will come up with one scenario, one speech, one sentence, and some other well-meaning person, trying hard, unbiased in a controlled environment with headphones, will come up with another one.”
Ryan, who has testified against other audio recording experts in trials, was asked to point out what he considers to be the vulnerabilities in any expert analysis of the 45-second recording. He had not heard Reich’s enhanced segments and was not specifically criticizing Reich’s work.
Listening to the 45 seconds, though, what Ryan hears are problems.
For one, the recording is poor, he said, a problem that Reich also acknowledged. Voices overlap, and there are multiple speakers and banging noises. The caller’s phone might have been unable to pick up all of the sounds outside. Also, the signal from the caller’s phone might have been degraded in quality and might have lost sounds as it was transmitted to the police department’s recording system.
Basic facts such as how far the caller’s phone was from the scene outside are also unclear, so it is difficult to know how distance or reverberations might have affected the recording.
Those problems are compounded when the science of acoustics is applied to degraded recordings typical of crime scenes.
The science is useful as an investigative tool, Ryan said, but limited in its usefuless in this type of audio recording. In this context, its reliability ranks below DNA and below fingerprints and has about the same technical challenges as trying to recognize a face on a degraded surveillance video, he said.
Tools such as spectrographs can bring some measure of clarity to fuzzy sound, and Ryan has used them. But they are vulnerable when it comes to deciphering actual words or phrases, he said.
“People look at those sometimes to determine if it was a ‘k’ or ‘p’ or ‘v’ depending on the restriction and the vocal path,” Ryan said. “But those waters are going to be so murky for this recording.”
“If someone wants to argue that’s the word ‘bun,’ ” he added, by example, “they could find someone to argue it’s the word ‘fun.’ ”
Ryan also questioned the basic idea that the age of the person or persons screaming during the 45 seconds — and thus whether it was 17-year-old Martin or 28-year-old Zimmerman or both — can be determined by measuring frequency, or pitch.
“To my knowledge, there are no scientific studies of pitch as an indicator or anything else in a scream that would give someone confidence to say how old somebody was,” Ryan said.
When it comes to emotionally charged situations, especially a life-or death situation, the range of the human voice is simply too wide and varied to correlate it accurately to age, Ryan said.
A 28-year-old might scream like a 17-year old. A 17-year old might yell like a 28-year-old.
“The science doesn’t help with a recording like this,” Ryan said. “There isn’t anything to hang your hat on.”
After the gunshot, the recording continued as other neighbors began to call 911, look out windows or go outside into the dark to see what was happening. And the vagueness and discrepancies in the accounts they gave only underscored how important an objective recording could be.
At least five neighbors told 911 operators that they had heard “someone” or “a male” screaming.
Another neighbor said he had heard voices “arguing.”
Another said that he had seen the person he thought was Martin on top of the person he thought was Zimmerman, punching him, and that Zimmerman was the one yelling for help.
At least two others said that they did not hear any fighting at all, only moans and whines that they are certain came from “a boy,” and that when they looked outside just after the gunshot, they saw Zimmerman straddling Martin, who was face down.
They and others watched as police began to arrive, and Zimmerman stood up and started pacing, his hand on his forehead.
There was also the reaction of Martin’s and Zimmerman’s parents. Zimmerman’s father told investigators that he was “absolutely, positively” certain his son was the one yelling for help on the recording. Martin’s mother said she had no doubt it was her son screaming. In an interview with police, Martin’s father listened to the recording and, distraught, said it was not his son, but soon after that he said that he was certain it was.
All of which leaves the question of how you hear the 45 seconds when you can’t hear the 45 seconds, when the context is conflicting and subjective and emotionally charged, when there is an expert such as Reich arguing with certainty about what can be discerned and another such as Ryan claiming that little can be discerned at all.
Ultimately, the only answer to that question that will really matter, if the case goes to trial, is the jury’s. How it hears the 45 seconds, said Stephen A. Saltzburg, a law professor at George Washington University, will depend partly on the machinations and strategies deployed during the trial.
If the prosecution can convince jurors that Martin is screaming for help on the recording, it could become “enormously powerful,” Saltzburg said.
If the defense can convince them that the recording is ambiguous, it will help Zimmerman.
In either case, Saltzburg said, what the jurors hear will depend on human nature.
“Most trial lawyers believe, and most psychologists who study the way that people process information believe, that once people interpret something in a certain way, once they begin to believe something, they become committed to that,” he said. “And if they become committed, it’s hard to change their minds.”
Staff researcher Julie Tate contributed to this report.