FROM THE resonant tones of an opera star to the shrill cries of a baby, the human voice has an extraordinary range. In a split second, it can swing from a soft whisper to a piercing scream. It can express pleasure and pain in countless different accents and expressions. Such remarkable variability, however, makes it exceedingly difficult to analyze.
The sounds of speech travel to a listener's ear as rapid variations in air pressure. Scientists can record these pulsations by measuring the pressure as it changes over time. The recording appears as a complex waveform, showing the smooth ripples of vowel sounds and the jagged peaks of harsh consonants. Momentary silences, where the waveform appears flat, punctuate the signal.
Sensitive as it is, the ear has trouble comparing and characterizing sounds when a great deal of precision is needed. Consequently researchers -- particularly those working on computerized systems that can speak or that are able to recognize spoken words -- have been developing methods for seeking out the patterns hidden within speech waveforms. Such techniques help a computer learn the differences between an "s" and a "sh," a "p" and a "b," the "a" in "father" and the "o" in "mom."
One proven tool for comparing and analyzing sounds is the acoustic spectrogram -- a sort of snapshot of a sound's loudness and frequency over a certain time period. In one form, the spectrogram looks like a three-dimensional map of a mountain landscape, showing which frequencies are present at a particular instant. Sometimes, however, both trained and untrained users have difficulty detecting differences between two spectrograms, even when their corresponding sounds are clearly different to the ear.
Such difficulties motivated Clifford A. Pickover of the IBM's Thomas J. Watson Research Center in Yorktown Heights, N.Y., to explore some novel methods for representing speech sounds. His innovations are a mixture of old ideas used in new contexts and new ideas brought to bear on longstanding data-analysis problems; and some can be used in new ways, such as detecting the differences between genetic material in healthy cells and cancerous cells.
Pickover's research is one instance of a growing interest in finding ways of coping with the vast quantities of data generated by modern instruments and other computer-aided techniques. In particular, researchers are trying to identify patterns and trends, which is one task that human beings can still do much better than computers, if the data are presented in a suitable form.
Unfortunately, they often have to sift through millions of pieces of information -- such as stock-market prices or sound-wave frequencies -- to find meaningful patterns. This problem has spawned a wide variety of computer-based techniques for analyzing data. Analysts can now plot data in a variety of ways, using geometry, animation and color. Some are even exploring the idea of transplanting numbers into sounds and musical notes so that patterns are easier to identify.
The Art of the Dot
One of Pickover's most striking and colorful data-display techniques produces figures with the sixfold symmetry of a snowflake to identify differences among certain types of sounds. Although formally termed "symmetrized dot patterns," Pickover also playfully calls them "speech-flakes." Generated in full color on a computer screen, they have the brilliance and beauty of patterns in a kaleidoscope.
Pickover's trick is to convert sound waves into a collection of dots -- a technique which is particularly sensitive to frequency differences. Even a slight change alters the curvature and shape of the flake's six arms and produces produce a strikingly different pattern. It can easily distinguish the sound of an "a" in "father" from the sound of the "o" in "mom".
Says Pickover: "Intriguing as an art form, these dot patterns may be a way of visually fingerprinting natural and synthetic speech sounds and allowing researchers to detect patterns in data not easily captured with traditional analyses." For example, Pickover's speech-flakes clearly show the difference between a human-made "ee" sound and a synthesized version of the same sound, even though the matching waveforms look similar.
As message and warning systems increasingly rely on synthesized voices to pass on information, research on the quality of synthetic speech becomes particularly important. Computer-generated verbal messages may someday play a vital role in airplane cockpits and power-plant or factory control rooms.
The speech-flake procedure has also been tested on animal sounds, from the croaking of frogs to the shrill whistles of dolphins, and may soon be employed on bird songs. And the patterns may be useful for detecting and characterizing heart abnormalities. (See illustration.) Physicians already can diagnose some heart ailments based on sounds heard through a stethoscope or electrical signals seen in an electrocardiogram. Pickover says his method of plotting heart sounds would allow doctors to detect significant patterns more readily.
Making Faces at Data
Like speech flakes, a cartoon face can convey a wide range of expressions -- and can convey a surprising amount of information. They can be used to represent complicated data, from psychological test scores to the characteristics of various sounds. (See illustration.)
Introduced in 1973 by statistician Herman Chernoff of Harvard University, the technique has piqued the interest of many data analysts. Using the characteristics of various features, such as the nose's shape or the mouth's curvature, a single face can convey the values of up to 10 different variables at the same time. If each facial characteristic has 10 possible settings, then 10 billion different faces can be generated.
For example, a psychologist studying personality characteristics may have to array the scores of 20 people each of whom has taken 10 different tests. The results of each test can be assigned to a particular facial feature. The score on one test might correspond to the amount of mouth curvature; the score on another to eyebrow angle, and so forth. Researchers can then group the resulting faces, a procedure which sometimes allows them to isolate "strangers," who don't appear to fit in. Finding such exceptional cases would be vastly more difficult if the researchers worked with raw numbers.
Chernoff's scheme depends on the ability of the human visual system to assimilate the entire cartoon face as a single chunk of information which condenses a vast amount of data. "Such faces," Pickover observes, "have been shown to be more reliable and more memorable than other tested icons and allow the human analyst to grasp many of the essential regularities and irregularities in the data."
Pickover, too, has used cartoon faces to characterize broad classes of sounds. Different types of sound, when analyzed in this way, appear to generate distinctive types of faces. The "speech-faces" may be useful in teaching deaf or near-deaf children how to modify the sounds they make. IBM researchers have also proposed the use of cartoon faces on control panels. Pilots in military aircraft, for example, are sometimes overwhelmed by the number of dials and indicators they must monitor. Displaying several key pieces of information in one place could reduce the overload. By learning to recognize certain types of faces as danger signals, pilots might react more quickly. And the faces could also bring together several signals that by themselves may not indicate a threatening situation but taken together reveal a potential hazard.
Listening to Chemicals
If the human ear has difficulty comparing sounds with absolute precision, it is very good at searching out meaningful patterns: Recognizing a familiar voice; picking out a single word in a cacaphony of cocktail chatter; hearing a flute in the midst of an orchestral romp. The ear can integrate disparate sounds into a harmonious whole or detect subtle nuances buried in noise.
Recently researchers have started to explore ways of using sound to sort out statistical relationships: Projected economic depressions are programmed to moan; seismic data suggesting the presence of oil reservoirs are made to rumble; various sound-effects help visually impaired students and researchers to perform routine analyses; and more.
For example, chemists learn a great deal of information from just a glance at a chemical's infrared spectrum. Seen as a wiggly trace across a sheet of paper, the spectrum indicates how much infrared light the substance absorbs. It's a kind of graphical fingerprint for identifying chemical compounds.
A few years ago, a group of chemists at East Carolina University in Greenville, N.C., interpreted the characteristic peaks and valleys of an infrared spectrum as a sequence of musical notes. A note's pitch depends on where a spectral peak falls, while its duration depends on the peak's height. The result is a set of musical notes that can be played sequentially or all together as a chord. A chemist can identify an unknown compound by listening to the sound of its infrared spectrum and then comparing it with the sounds of the spectra of known samples. Often, if the compounds are simple enough, hearing only the chord is sufficient to allow a positive identification.
Recently, two biochemists at Michigan State University in East Lansing applied the same idea to graphs created by instruments used to analyze urine and other chemical mixtures. In this case, the graphs are spewed out by analytical instruments such as gas chromatographs. Creating a set of spikes across a roll of paper, they record which compounds and how much of each are present in a sample. A computer coupled with a Moog synthesizer translates spike heights into musical notes. A higher spike means a higher pitch.
The researchers, Charles C. Sweeley and John F. Holland, say their method may be useful for monitoring the quality of industrial or laboratory processes. Instead of examining every chart that comes off an instrument, technicians could just listen for "sour" notes that may indicate a problem. Sweeley himself has set analyses of urine samples to music, and from the results he can detect differences among samples from people with certain genetic diseases.
Sound, like Chernoff cartoon faces, can also be used to represent data involving several variables. That was tried more than five years ago, when Joseph J. Mezrich and two of his colleagues at the Exxon Research and Engineering Co. in Annnadale, N.J., used the combination of computer animation and computer music to identify economic trends. In a sense, their approach was like a movie with an orchestral soundtrack, with animated figures on the screen and various musical instruments taking on the roles of variables.
Mezrich experimented with plotting economic data to look for signs of boom or bust. For example, an economist looking for indications of an impending depression may track car sales as one variable, unemployment figures as another, housing starts as a third, the value of the dollar overseas as a fourth and so on. As the values of these variables rise and fall from month to month, certain combinations and patterns in the data could foreshadow serious economic problems. All of these data can be orchestrated into an animated feature that attempts to foretell the future.
Mezrich's results were encouraging enough for Exxon to consider his technique of combining sound and animation as a way of analyzing seismic data collected in the search for oil. Making sense of the maze of recorded measurements from underground explosions that shake the earth and send sound waves over long distances is a major task consuming vast quantities of computer time. Converting the data to music may uncover the subtle hints that add up to a significant oil reservoir deep underground.
Whether it's looking for economic trends, categorizing chemical compounds, diagnosing heart ailments, or studying the quality of synthetic speech, the challenge is to make sense of masses of data -- to find the diamond in a statistical mountain. Human beings have trouble coping with pages of pages of numbers. By turning those numbers into a graph, a picture or musical sounds, analysts find it easier to pick out a useful pattern or trend. The tough problem is finding the right one to use.
Researchers sitting at their computer work stations will eventually be able to call on a broad range of data-analysis tools to guide their work. With a few keystrokes, analysts will be able to transform numbers into shapes dancing across a screen to the tune of a computer-synthesized ballad. Searching for insight, they will be able to shift easily from one representation to another, altering colors, adding or subtracting sounds, trying various shapes.
But the story could end with an ironic twist: The availability of so many different ways of representing data might itself contribute to the data-overload problem.