# Hearing with the Fourier transform

This being my first post, I think it should be about something beautiful and interesting that isn’t confined to just one aspect of science. This post is about how we hear. More specifically, about how our ears realize sound from mechanical vibrations in the air, and the truly fascinating physics and engineering behind this. The inner ear is a sophisticated instrument perfected by hundreds of millions of years of evolution. Although the concepts in the rest of this post apply to the auditory organs of all mammals, when I refer to the ear henceforth I will be talking about the human ear specifically.

I’m an electrical engineer, so I’m afraid I’m going to have to begin with a little math. Don’t worry though, the math is probably new to you but it isn’t difficult, and if you grasp it you’ll find it fascinating how a fundamental mathematical tool we use in engineering has an analog in nature.

An important part of electrical engineering is concerned with signal analysis. Any exchange of information or data, whether through computers, televisions, phones, or any modern device you can think of, requires knowledge of signal analysis. Without an understanding of how signals can carry information, we’d literally be in the medieval times. An essential mathematical tool we use in signal analysis is called the Fourier transform. The Fourier transform takes a signal and shows us all the frequencies that make it up. This is a very useful ability to have when we’re trying to send information through a signal, since we encode information in frequencies of a signal. The frequency of something is just how often that something oscillates per unit of time. The standard unit of frequency is hertz, and it is just a number of cycles or oscillations per second. Any wave has a frequency which describes how often the quantity goes from it’s maximum value to it’s minimum and back to it’s maximum value again. For example, visible light waves have a frequency between 400 (red) and 790 (violet) terahertz. A terahertz is 10^12 hertz, or 1 with 12 zeroes after it. This frequency means that, for example, the electromagnetic field of violet light goes from it’s maximum value to it’s minimum value and back to it’s maximum value again 790 * 10^12 times every second.

We can also talk about angular frequencies. Angular frequencies are given by the symbol ω (omega) and they are useful when we are describing something that is rotating. For example, we could say an engine that rotates once every second has a frequency of ξ = 1 Hz. We could also describe it as having an angular frequency of ω = 2Π radians per second. a radian is just a measure of angle, and 2Π is the total angle in a circle. This is exactly the same as saying that the engine rotates at 360° per second, but we use radians because they are a lot more convenient mathematically. We can convert between regular frequency and angular frequency easily using the equation: ω=2Πξ

If we were to model the rotation of the engine graphically in one dimension, we could choose a point on the engine and watch it as a rotated. The motion that that point produced over time as the engine rotated would look like a sinusoid. The horizontal axis would be time, and the vertical axis would be the vertical position of the point as a function of time. The easiest way to visualize this is through an animation: Don’t worry about the rest of the stuff on there, just notice that when you look at a point on a rotating object in one direction, it looks like a sinusoid.

Now that we understand frequencies, we can see what the Fourier transform looks like mathematically. On the left, we have the output function of the Fourier transform, which is a function of frequency. It tells us how much of each frequency is present. The input function, which is a function of time, is the one inside the integral sign on the left: Notice that the exponent of e in the transform equation contains 2Π times the frequency, which is just the angular frequency ω. The input and output functions are continuous, which means that we can talk about the value of the input signal at x=5.385 seconds, just like we can talk about the component of the frequency at ξ = 2.971 Hz. We don’t need to confine ourselves to either 5 or 6 seconds, and we don’t need to only talk about frequencies at either 2 or 3 Hz. i in the equation is the imaginary number, which is equal to the square root of -1.

So what would happen if we took the Fourier transform of a simple sinusoid? Let’s say we take the transform of cos(2Π*5*t). This is a sinusoid with frequency 5 Hz or 10Π rad/s. The transform would just be an arrow at ξ=5, and it would look like this. It’s an arrow as opposed to a regular point for some mathematical reasons that we don’t need to go into, but it has to do with the dirac delta function. The point that the arrow reaches is not important either because it actually takes a value of  infinity at that point, for the same mathematical reasons. A nice property about the Fourier transform is that the Fourier transform of a sum of functions is just the sum of the Fourier transform of each of those functions. This is useful because if we have a sum of 3 cosines, each at a different frequency, it might look something like this: But the Fourier transform would just be 2 arrows for each frequency, one for positive and one negative. Disregarding the exact numbers on the axes, the above function might be describing the vibrations of 3 musical notes from a piano. The vertical axis would be the pressure of the wave and the horizontal axis would be time. If you were to walk up to a piano and play the right notes, a wave very similar to this would enter your ear. This specific wave would actually be very unpleasant, since the different frequencies are not mathematically related to each other in a way that is musically pleasing to the human brain. The mathematics of music is another fascinating subject I could write a whole series of posts about, but that is not our subject at the moment, perhaps in another post.

But how does the ear hear the different notes in that squiggly vibration entering your ear? The waveform above is just a description of the pressure at the tympanic membrane (eardrum) over time. The eardrum is a stretched membrane, like the skin of a drum. What we refer to as sound waves are actually pressure waves that are converted to sound in the ear. The vibrations of the tympanic membrane are transmitted through 3 small bones called the hammer, the anvil, and the stirrup. These bones are the smallest bones in the human body. The stirrup touches the oval window, which vibrates and transmits the energy of the vibrations into a fluid contained within the cochlea.

This is where it gets really interesting. The cochlea is where the magic happens. The cochlea is filled with a fluid that vibrates in response to the motion of the oval window.  As it vibrates, it moves the basilar membrane. The basilar membrane is tapered, it’s very stiff and narrow in one end and thick and compliant in another. Only high frequency vibrations move the thin stiff end, and low frequency vibrations move the thick compliant end. In this way, the ear can respond to a wide range of frequencies, from about 20 Hz to 20,000 Hz. Other mammals can detect even higher frequencies. Bats can detect frequencies as high as 100,000 Hz!

The astute readers have probably already figured out what I’m getting to concerning the cochlea. Imagine we could somehow unwrap the cochlea and observe it in action. If we could plot a point above the cochlea in the air corresponding to how much it was vibrating at that point, and we connected those points, we would actually see the Fourier transform of the incoming signal! The cochlea actually performs a Fourier transform! This is absolutely stunning to me as an electrical engineer, since I learned the Fourier transform exclusively as a mathematical tool. This is like saying our inner ear can do complex calculus! Actually it’s a lot more impressive than that, because our inner ear has been doing this for hundreds of millions of years, while the Fourier transform wasn’t developed until the 19th century! If this doesn’t seem particularly wonderful to you, read it again.

Throughout the continuous process of discovery and refining of knowledge that is science, we must apply our intelligence to discover, and we must treasure and appreciate marvelous discoveries such as these. They teach us as much about ourselves as they do about the world around us. One of my favorite quotations by Carl Sagan really embodies this spirit: “the use of our intelligence quite properly gives us pleasure. In this respect the brain is like a muscle. When we think well, we feel good. Understanding is a kind of ecstasy”