The Principles of Impulse Response Effects

Other sites on impulse response:
The Scientist and Engineer's Guide to Digital Signal Processing

This article has been written especially for customers of IpsoLogic's Impulse Responder, which is a generic tool for the low-level design and hybridisation of impulse response effects. This family of effects includes filters, echoes, reverberation and resonance. A basic beginner's overview is included in the application's help files, but this article goes into greater depth and provides a more thorough presentation. We will start with first principles and work up to a point where a broad technical understanding of the principles of impulse response should have been acquired.

Impulse response uses the simple mathematical technique of creating a weighted moving average of the sample values in a digital wave. The impulses, as they are called, or coefficients, act upon individual sample values and the formula generates a single average value, which is becomes an individual sample in the output. The system then steps along by one sample and the calculation is performed over and over again, until every coefficient has interacted with every input sample. Any such collection of impulses is known as a kernel. Each one has a fixed position relative to the others, which is vital to the nature of the process. With each averaging calculation, the result is written to the output.

The concept of an impulse is an abstract mathematical one: it has a specific value (or graphically a height), but in theory it has no duration. It is one-dimensional. Impulses are situated discretely in time so as to act upon just one sample each. While duration is a meaningful concept for audio samples, for the coefficients that act upon them, it is not. If you are of the view that this is a pointless fiddling with irrelevant minutiae, then you are not alone. But that's where this subject goes when you get into the mathematical nitty-gritty of it. But relax. The Music Page is committed to avoiding that where possible.

The Delta Function
The simplest case of an impulse response effect is what is called a delta function. Delta functions are kernels with only one impulse. This is the best doorway into the subject for a real beginner. Take a delta function with an impulse of value 1.0, situated at time zero. (By time zero, I mean it is reading from the input at the same location as that at which it is writing to the output). The output of this system is going to be identical to the input in every respect. Now if you move the impulse backwards in time, so that the read position is now behind the write position, the output signal will be delayed. If the impulse is placed at position +441, a CD quality signal will be delayed by 100ms. If this is done with only one channel in a stereo signal, the kernel for the other channel remaining untouched, then you will have a one channel delay or left-right bounce effect. If it is moved forwards to a negative position then the delay will be exchanged for an advance. In relative terms between two audio channels, this is the same effect.

Another effect can be acquired by changing the value of the coefficient. If the value is greater than 1 then the output is an amplified version of the input. If it is less than 1 then de-amplification occurs. A coefficient of zero gives an output of silence. A negative coefficient inverts the input signal. (Inverting one channel and not the other, by the way, creates an strangely "compressed" stereo effect on complex signals with lots of high midrange or lower treble, that are otherwise identical in both channels. With headphones it is like the sound is coming from a 2 dimensional plane cutting vertically through your head).

Finite Impulse Response (FIR) and Simple Filtering
The subject of filtering is without doubt the most boring out of the entire field of DSP. But it's the bread and butter of signal engineering and a necessary step to take in coming to an understanding of impulse response. So read on.

The simplest of filters are those that create an evenly weighted average of adjacent sample values. This is much the same as the effect of a single capacitor in an analog circuit. Imagine a kernel containing two impulses of equal value. It may seem reasonable for a beginner to think that such a kernel will totally filter out all frequencies above half the Nyquist frequency (which is half the sampling rate - the maximum possible in the given audio stream). But it doesn't. Think about it graphically: Any movement in the wave which is shorter than the kernel will at times, as the kernel moves along the input stream, be only partially acted upon. Particularly if the tail ends of the kernel are of a significant value, higher frequencies will create some disturbances in the output frequency response. But more importantly, some original treble will remain in the output, attenuated but not eradicated. Another problem arises from the fact that all frequencies involve movement from one sample to the next. Therefore, the process of averaging will also retard lower frequencies, leaving none untouched except the theoretical minimus of 0 Hz. The greater the movement within the kernel's domain, the greater the attenuation.

The frequency response of a classic filter kernel (ie: low pass, high pass or band pass) can be divided into 3 sections. There is the pass band - the continuum of frequencies that are maintained at or close to their original level. This is the part of the sound that is basically left alone. Then there is the stop band - the continuum of frequencies that are attenuated significantly. This band is also characterised by a significant amount of rippling where small frequency bands are raised above others. Finally there is the transition band, which is the area in between the two, where the frequency response slopes from one to the other. The transition band begins at the cutoff frequency, which is typically considered to be the frequency where attenuation is 10% of the full stop band attenuation. The transition band stops at the first frequency that is attenuated equally to the average attenuation of the stop band.

The object in many applications of filtering is to obtain as narrow as possible a transition band, as with as great as possible attenuation in the stop band, and little as possible rippling in the stop band. The general rule is to increase the filter order - the number of coefficients used to create the moving average. A higher order gives you better quality all round, as well as a lower cutoff in a low pass filter (effectively so, by steepening the transition and reducing the stop band), but the trade off is with computation time. An alternative way to improve this situation is by employing an alternative windowing function. The simplest window is the rectangular window, in which all coefficients are the same. This gives the poorest quality result of all standard windows and requires a very large number of coefficients to yield a result equivalent to a higher quality window. Next best is the triangular, or Bartlett window. This, as the name suggests, follows a pattern where coefficients step upwards from either end to the middle. Like the others that follow, it must have an odd number order. The next few are formulated from cosines and the like and follow sinusoidal curves. They are the Hanning, Hamming, Blackman, Gaussian and Blackman-Harris, ranked in order of relative quality. Each of these has other minor characteristics that distinguish them from each other. Finally, there is the Kaiser window, which is calculated by an even more complex equation, and through one of its variables (alternately written as beta or alpha), can be adjusted to closely mimic any of the foregoing, and to render even higher quality still. For beginners again, you may find it strange that the rectangular window is the worst, given its equal weighting of all impulses, as opposed to the narrower shape of the others, but the reasons for this are mathematically esoteric and beyond the scope of the article.

Another point to be aware of is that of symmetry. All samples leaving the kernel must be treated the same way (in reverse) as when they entered it. Otherwise changes from one sample to another will have an undesirable effect on the output, creating undue rippling and forcing changes in the phase characteristics of higher frequencies. This is called phase distortion and is a problem common in cheap analog equipment. An asymmetrical kernel will render non-linear phase in the output. Higher frequencies are advanced in time because they have a more sudden effect on the output, while lower frequencies have a less sudden effect, and are therefore left behind. We cannot hear phase as we hear frequency and amplitude, but when this pattern is prevalent in a sound, we detect it as an overall dull character. Nothing you do such as raising the treble, applying ambient and 3d effects of any sort, will rectify this problem until the phase is made linear again, or the pattern is reversed.

This section has presented the technical basis for impulse response processing and the simplest forms of filter. For simplicity, we have restricted this discussion to finite impulse response kernels with classic shapes. Another article, to be published before long, will present more advanced issues inherent in filtering, as well as other methods such as infinite impulse response (IIR) and frequency sampling.

Simple Delay Based Effects
We will skip ahead now to a more intuitive aspect of impulse response. The simpler delay-based effects are quite easy to grasp, and may be considered together in one section.

A basic one-tap delay is very easy to create. All that is needed is a single impulse set at a certain time delay from the zero impulse. The delay impulse is not adjacent to the zero impulse, as in a filter, but its position is calculated by multiplying the desired time delay (in seconds) by the sampling rate of the input audio data. The zero impulse represents the original sound and causes it to be written to the output, and the delay impulse represents the single regeneration you have programmed into the system. The value of that impulse controls the volume of the regeneration. If you want to control the tonal quality of that regeneration you can do this by creating a FIR filter kernel around that impulse, leaving impulse zero as is. This effect is especially useful in two channels if you introduce a delay of 10 to 40 ms in one channel and delete impulse zero. This delays the sound in one channel and creates a lush, spacious effect.

Multiple echo-style regenerations can be produced either by placing more impulses at the same distance from each other, to make a repetitive effect, or by sourcing the data for processing from the output, rather than the input. Output sourcing has effect because the data that is read from the output is delayed and written back to the output ahead of where it was read from, to the position where everything else is being written as a weighted average. When the impulse reads from the output, it is reading what it has previously written. (But this only works if an input-sourced impulse sits at a position prior to the output-sourced impulse. Otherwise the audio output at the end of processing will be completely silent. For this reason Impulse Responder does not allow impulses earlier than +1 to be output-sourced). Output sourcing of data is thus called recursion, whereas input sourcing is called convolution. A recursive echo has a natural sounding decay, where the volume of a recurring sound is reduced by the same factor with each repetition. This is a very efficient way of creating long, long echoes, especially if you wish to filter the regenerations. (Also note that filtering a recursion echo results in an intensifying effect, rather than an unchanging timbre, as if filtered later). Convolution echoes take longer to calculate because an impulse (or an entire filter kernel) must be placed exactly where every regeneration is required. But the great advantage is that enveloped echoes, or echoes with staggered regeneration levels are possible, as are changing or alternating delay times, delays with offset beginning times.

A resonator is an echo with a delay time so quick that it produces a sonic artefact - a ringing sound - of its own. Technically all echoes resonate (literal meaning: "sound again"), but as humans, we recognise a sound as resonant when its regeneration time is within the human sonic frequency spectrum. The delay must be at least 50ms (20Hz) and can reach any frequency up to the top of the listener's range. But for practical intents and purposes, it seems best to limit this range to something like 40Hz - 2000Hz. Resonant effects are sadly overlooked in a lot of effects software and equipment, except in the form of two highly specific applications. These are flangers and phasers. Both involve an echo, typically a recursive style (or in some cases perhaps an output sourced impulse in parallel with an input sourced impulse - which Impulse Responder is not made to do), with a resonant delay time. They are both basically the same effect, except their different tonal quality derives from the fact that a phaser has a shorter delay time, and thus can cause phase cancellation between the dry and delayed signals. This divides the frequency spectrum into tiny pass bands and stop bands in what is called a comb filtering effect. In a modulating phaser, the frequency response constantly transforms its shape. The resonant frequency, being high pitched, is shrill, and is not so easy to hear as it dies away rapidly. Flangers should technically be capable of comb filtering low frequencies, but no such effect is noticeable. Instead it is the resonant frequency that is the dominant component as slow regeneration times make for long, lingering tones. A bass to melody-pitched ringing sound is produced, and the entire output sound takes on a metallic tone.

As with echoes, the tonal quality of a resonant effect can be altered by applying a filter kernel at the point where the delay impulse is located. But this is the most overlooked aspect of straight resonation, in that what is changed is not just the tone of the input, but the tone of the resonant artefact. A digital simulation of the popular synthesizer effect of a resonant LPF works similarly to this. The original signal is low pass filtered and a recursive delay is introduced at a frequency equal to the cutoff frequency of the filter. Creating a filter kernel with a complex response, such as with frequency sampling, should enable generation of artefacts with completely original tonal qualities, which sound like a natural part of the sound, rather than an artificial addition.

Reverberation
Reverberation is a very sophisticated form of echo. An efficient digital system does not even remotely attempt to model a real acoustic environment (although there are programs that do this). Instead, while this is probably not the only plausible solution, the effect is commonly split into two sections - one is known as the early reflections (ER), which simulates the sound rebounding directly off the nearby surroundings and bouncing around in a relatively simple manner at the very beginning. Then the other part of the effect is the general reverberation (GR), which simulates the effect of wide dispersion and multiple regenerations after a significant number of rebounds have occurred, and a perceptually "fused" sound emerges rapidly to replace the first echoes.

In terms of design, one generally conceives of the ER as a set of 4 to 8 recursive echoes operating in parallel - working with the original input only and not with each other's output. But Impulse Responder achieves the same effect by generating a set of convolution echoes and writing them all together at the beginning of the kernel. ER's are typically designed to colour the sound and give the impression that it is being produced in a room of a certain shape. This effect is created simply by geometrically calculating a number of regeneration times with respect to the position of the source, listener and dimensions of the room. Differing patterns of reflection suggest to the subconscious mind the type of room that they are originating in or being heard in.

The output of the ER is then fed into the GR algorithm. This makes for a less "present" and more natural sound in the final effect. In impulse Responder, the use of an ER stage enables some freedom to adjust the wet:dry mix without effecting the overall volume because a signal for the out-sourced GR is produced by the in-sourced ER. But what does happen is a change in the overall impression of "presence" as the amount of the original sound reaching the GR changes. The GR consists of a massive series of regenerators, each spaced at very brief random intervals (randomised within a certain range), and each having a very low regeneration level. The closer the regenerators are, the less power each needs, as they are able to carry on each other's output before it is gone completely. Having too few in a given period requires levels that are too high; some really nasty resonant artefacts are created this way. Impulse Responder's Reverb Designer uses anything from 25 to 290 regenerators in the GR component, depending on the pattern in use and the size of the acoustic space being simulated. A greater number of regenerators placed within a given time period also makes for a thicker, richer sound, whereas a smaller number is more open-sounding, and in fact more musically useful in most cases. Impulse Responder uses a data reduction technique that incidentally achieves a good balance between these two goals. By spacing out the regenerators in the last half or last third of the kernel and raising their power levels, a long, smooth reverb can be produced with reduced processing time, and a result that is pleasing to the ear in a mix.

An interesting aspect of GR design is that the overall sound of an acoustic space of different dimensions can be created very simply, by calculating regeneration times with the use of the hypotenuse of the acoustic space plan view, rather than, for instance, the sum of the dimensions. That is because in the real world, when a sound hits a wall at any angle other than 90 degrees, it must bounce from wall to wall down to one end of the hall, tunnel, etc., and return by a similar route. If the acoustic space were more squarish, the route would be more round-about and less likely to take a long time. So regeneration times are greatly affected by dimensions. The hypotenuse of a plan view is just such a variable, and in fact it is twice the average distance travelled from one reflection point to another. The psychoacoustic effect of this factor is quite noticeable.

Reverberation is often done as a convolution effect, or as a recursion effect with a final wet:dry mix which is not part of the effect itself. Convolution reverbs can be manipulated in terms of enveloping much more easily that recursive reverbs. The wet:dry mix can be adjusted at will within the calculation without any effect at all on the overall sound, and they can (as also with echoes and resonators) be enveloped or reversed and placed before impulse zero so that they build up to the dry input. But the computation cost is phenomenal - and that is the main reason why Impulse Responder is limited to recursion reverbs in the GR component. The difference between the two forms also manifest where filtering is required. This is a necessary aspect of any reverb that is supposed to sound natural, as virtually all physical materials absorb high frequencies better than lows. Placing a small low pass filter kernel over each regenerator achieves this. But it multiplies the computation time by a factor equal to the order of each filter. In a convolution reverb this can result in outrageous calculation times even on a respectably fast computer, as there must be a separate regenerator for each return of the original sound.

Conclusion
Impulse response is a very powerful technique. If used judiciously, it can be manipulated to perform a very wide variety of functions and shaped to achieve some interesting results. What should be abundantly clear is that it is in no way limited to the set of effects supplied as standard features in studio equipment, software packages and effects pedals. This article has attempted to make plain the principles behind this technique with a view to helping the reader get more out of IpsoLogic's Impulse Responder, and generally, to better understand the workings of much of their studio equipment and software. At the time of writing, Impulse Responder does not perform modulation effects, but this restriction will be lifted in a future version. Another potential of impulse response is the routing of data from an individual impulse, or group of impulses, through other processes such as a wave shaping synthesizer (ie: distortion/overdrive). Stay tuned for that.

The Principles of Impulse Response Effects

Australia-wide Located in Geelong

Australia-wide
Located in Geelong