Tonal and Dynamic Masking in the Mix(Letto 37 volte)

November 28, 2024/0 Comments/in Audio, Audio Mixing/by Alessandro Fois

Tonal and Dynamic Masking in the Mix

Every musical mix is a unique balance, in which the hierarchy of elements and the management of tonal and dynamic masking play crucial roles. This article will guide you in the art of prioritizing, optimizing sonic integration, and enhancing each track to create a clear, cohesive, and musically effective mix.

Hierarchy of mix elements

The elements of the mix take on a hierarchical importance, which varies from mix to mix.

This hierarchy depends on many factors, including the nature of the production being mixed: in hip-hop, for example, the “beat” and the vocals are generally the most important elements.

In jazz, the ride is “more important” than the kick, while “spatial effects” are an important element in ambient music and absolutely accessory in other genres.

The bass drum is a central element in dance music, but it is somewhat less important than the snare in pop music in general.

And so many other examples could be cited.

To better reason, we should consider the nature of each element and its role in the overall musical context, thus imagining a sort of specific hierarchy for each piece of music.

Lead vocals, for example, are always of primary importance, but lyrics can also be of considerable importance, so the voice that "narrates" an "important" text (for example in a singer-songwriter song) will have to emerge more clearly than in other contexts, to ensure its maximum "readability", while in other cases the lead vocal can remain more immersed in the mix.

The relative importance of individual elements will influence how we mix them, whether it's levels, frequencies, effects, panning, or depth.

Clarifying this hierarchy can improve your workflow by minimizing less important processes, such as spending too much time fine-tuning the sound of a synth chord pad to be used at a very low volume in a single verse.

This priority takes on even greater importance in "closed budget productions", characterised by a pre-established "hours" to be dedicated to the mix; consequently, we may need to plan in a very restrictive manner the time to be dedicated to each element according to this hierarchy and according to the difficulty of the process, for example: 1 hour for the treatment of the drums, 15 minutes for the bass, 1 hour and a half for the lead vocal and so on (obviously, any global operational plan with a pre-established binding time available must leave ample margins for unforeseen events and second thoughts).

This is why it can be crucial to be able to define, each time, when carrying out a specific operation, how important it is in relation to the overall economy of the piece.

By hierarchy, I certainly don't mean to say that the trumpet is more important than the guitar or anything like that; rather, I mean to say that every arranged and orchestrated piece of music contains musical parts of greater or lesser importance, corresponding to their role, regardless of the instruments or voices performing them.

There are specific pieces and musical genres, for example, in which the harmonic support is more important than the rhythmic one and vice versa, but in general the criteria that define the hierarchy of the musical parts are the same for all.

Besides the practical reasons mentioned above, why establish a hierarchy?

In the orchestration of a piece of music, there are numerous sound sources which, from a tonal and dynamic point of view, compete with each other to "conquer a space of audibility" within the piece, to the partial detriment of the other sources.

Consequently, while avoiding distorting the individual sources, a more or less profound intervention on the dynamics and on the EQ will be necessary, in order to favor a good interplay rather than mere superposition.

To this end, two different operational paths are opened:

tweak all the sources a little bit proportionally
to give the highest sound quality to the most important elements, consequently exposing the less important ones to deeper interventions to adapt to the first ones

Most of the time, I prefer the second approach, which allows you to respect and maximize the “essential” elements of the mix, leaving the other ingredients with the less essential task of “dressing it up.”

It is obvious that between the two criteria there will be infinite intermediate gradations, more or less adoptable according to the sonic content of the specific piece being worked on.

Below is an example hierarchical criterion, which can serve as inspiration for some areas of pop music and related genres:

Primary elements

To be considered in order of importance:

main melodic elements (any soloist: voice, sax, lead guitar, piano, etc.)
The marking and supporting elements at the bottom: bass drum, snare drum, HH, bass
the main element in the rhythmic-harmonic structural field (only one, for example: the piano, the acoustic guitar, etc.)
the remaining pieces of the drum set or other main percussion elements (e.g. congas and bongos)

Secondary elements

To be considered in order of importance:

secondary melodic phrases (back vocals, “obligatory” phrases by wind instruments, strings, etc.)
harmonic elements (e.g. string or wind chords, or keyboards, etc.)
secondary rhythmic elements (percussion, rhythm guitars, etc.)
other musical elements
non-musical effects

The above is by no means a rigid criterion, since in reality each song has its own “recipe” which requires a specific preponderance of ingredients, just as each musical genre outside of rock-pop may require very different criteria of “hierarchy”.

It is important to note that these criteria will only be fully applicable when the recording has been overdubbed or in any case when the sonic independence between the audio tracks is sufficiently high, as in the case of cable recordings or acoustic recordings made in well-insulated boxes or rooms.

When a significant sonic influence is found between the tracks, it will not be possible to proceed in a strictly hierarchical manner, but rather it will be necessary to seek an overall balance through an almost obligatory path that will probably lead the mix to a dimension closer to the original recording proportions, in order to safeguard the tonal integrity of all the elements.

The hierarchical criterion will be especially applicable in the pop field, with recordings often made by overdubbing or using isolated rooms or recording via cable.

In such cases it would be advisable to devote oneself to obtaining a definitive mix before proceeding towards a definitive mix. essential mix, that is, made up of a few primary elements with which an almost complete and convincing mix can be created.

Only later can the new ingredients be added, proceeding with the necessary care so as not to compromise the balance previously achieved.

Masking in the Mix

Masking is the ability of one sound element to partially cover another.

Louder sounds mask softer ones, so the higher the volume of one element in the mix, the more clearly it will tend to be perceived, but this will happen at the expense of the others.

Tonal masking

This occurs when the masking elements are expressed mainly in the same tonal range as the masked ones.

Competition for the same tonal space is therefore the basis of mutual masking.

The problem can be optimized by adjusting the elements of the mix in a complementary manner, that is:

bringing out in each of them specific tonal bands that are different from each other
attenuating the other tonal bands in order to free up space for the other elements of the mix.

This way each element will appear more defined and clear.

Dynamic Masking

Percussive instruments come and go and peaks are short-lived; for example, a kick will typically have little or no sonic content between the various "hits": it is therefore unlikely that a "short" percussive sound, however loud, can mask the longer-lasting sounds; we can say that it makes its way with each "hit" for a very short time but during this time it manifests a dynamic preponderance (volume) that allows it to emerge when necessary.

Percussion instruments compete for tonal space at various limited instants in time, while other instruments sustain the sound for much longer periods and therefore constantly fight to gain tonal space.

A synth pad and the harmonies of wind instruments, strings and a choir, but also the solo phrasing of voices, strings, wind instruments, and any other sound source at a high volume, will all require more attention than percussive ones, because each of their level, panning or EQ settings will have a greater impact on the whole due to their persistence over time.

Raising the volume of a pad will certainly cause greater masking problems than raising the volume of a snare drum. Even if the snare drum were to mask the pad, it would only do so for very brief, negligible durations that would not destroy the continuity of the pad's musical perception. However, if a high-volume pad were to mask a low-volume snare drum, it would do so consistently, causing a serious problem.

In this sense, plucked string instruments, due to their decay (generally less rapid than percussion), fall somewhere in between.

Thanks to their characteristics, the piano and sometimes acoustic guitars (both used as accompaniment) can be subjected to a process of dynamic expansion that makes the parts with the most incisive performance stand out more while attenuating the others.

In this way, their masking power would be reduced in many moments, in order to allow the other elements of the mix to emerge more easily.

Fit to define

A good tonal fit will therefore allow you to determine the maximum definition of the musical parts of the arrangement and the overall sound of the mix.

Let's be clear: masking an element can be solved simply by raising the volume of the element we want to emphasize; proceeding only in this way, however, risks masking the other elements even more; consequently, tonal masking must be solved partly by managing the volumes and partly by using the tone controls.

You can also try adding a hint of harmonic saturation to the element you want to enhance, creating harmonics in an otherwise lacking tonal range. This tends to work best with low and low-medium textured sources (bass, electric guitar chords, low-textured synths).

How can tonal matching be improved?

Masking Analysis

As we have seen, we must first distinguish between impulsive and short sounds (such as percussion) and soft and long sounds (such as voices and strings).

The snare drum, for example, might have an essential tonal range very close to that of the lead vocal, but its short duration will not allow it to mask the latter to any appreciable extent.

A guitar, piano, and keyboard pad playing together continuously throughout the entire song with sustained parts and using roughly the same octave range, however, would certainly compete with each other to create a defined tonal space in the mix.

In general, in the higher tonal ranges, tonal overlap would create less masking and confusion than it would in the lower tonal ranges.

Let's analyze what happens in the various bands.

In the lower bands (between 20 and 80 Hz) Fortunately, only a few sonic elements are expressed: in the pop-rock context, for example, we essentially find the bass (a long sound with a soft initial peak) and the bass drum (a short sound with an impulsive peak), as well as some sporadic incursions from the timpani. Consequently, in this potentially critical range, in pop-rock and related contexts, it will be sufficient to obtain a good tonal fit between the bass drum and the bass.

The tonal range between 80 and 500 hertz it is perhaps, in fact, the one most subject to tonal masking problems as it retains much of the criticality of the low end but is full of "competing" sound sources.

Even without considering impulsive sounds, but only the long and "maintained" ones, we consider that in this range we find:

the fundamentals of some bass notes and their most important natural harmonics
the low and mid-low notes of instruments such as guitars (electric and acoustic), the piano and the “pads” of keyboards and strings
the fundamentals and first harmonics of soloists such as the voice, the sax, the lead guitar

The tonal range between 500 Hz and 5000 Hz it is also subject, although to a lesser extent, to the same problem of tonal overlap,

Finally, the even higher range suffers a little less, also because there will be few elements expressing themselves massively between 5,000 and 20,000 Hz, so that this area, although still crowded with the natural harmonics and various overtones of all the sound elements (which in certain tracks we can even eliminate completely), will be substantially occupied only by very bright and subtle elements, such as the cymbals, the triangle and other similar elements.

Tonal interlocking

The methods for optimizing tonal matching are all those aimed at freeing up tonal space useful for other sound sources.

Here is a practical decalogue:

Eliminate the tonal bands below the lowest fundamental played by the element

This will be achieved by means of a high-pass filter (HPF) or its functional counterpart Low-Shelving-Eq

It must be understood that, sometimes, below the fundamental there will also be elements of noise that are functional to the texture of the sound, which can be attenuated with a standard slope of 6-12 db/oct in the presence of a rarefied mix, or even eliminated with a drastic slope of 24-60 db/oct in the presence of a dense mix in the low and/or mid-low range.

Eliminate the higher tonal ranges

This practice is risky as it will cut away some of the natural harmonics and other overtones of the sources, therefore this expedient should be used only for the “dark” elements that are not functionally expressed in these ranges, such as: the bass drum, the bass, the tomtoms and a few others.

The cut-off frequency should be chosen on a case-by-case basis, also taking into account the tonal crowding in the super high range, by applying a low-pass filter with a moderate or medium slope (from 6 to 18 dB oct) and a cut-off frequency between 5 and 12 kHz, as appropriate.In any case, it will be good practice to cut frequencies above 20,000 Hz in each track and in each bus, with a drastic slope of 48 dB oct for example.

This should help drastically reduce the risk of aliasing, i.e. the generation of unwanted harmonics in the low and mid range as a result of exceeding a frequency equal to half of that used for sampling in the DAW session (for example, for a 48 Khz sampling the cut should occur drastically below 24 Khz, so a cut at 20 Khz will be fine).

By using higher sampling frequencies (for example 192 Khz) obviously the aliasing problem will be much less relevant and the cutting of the ultrasound will become a negligible practice.

Please note that eliminating harmonic infiltration in the low range as a result of aliasing, in addition to creating a slight deformation of the timbre and a bit of dysphonia, will result in maintaining greater clarity and sound definition, thus helping to limit the causes of masking.

Attenuate the low end of the polyphonic accompanying instruments

We're talking about the instruments that accompany a soloist, such as guitars, piano, a keyboard pad, or string harmonies.

Often their range reaches the frequencies in which the fundamentals of the bass are expressed (which can operate mostly between 30 and 170 Hz and occasionally reach 200-240 Hz.

To avoid excessive overlapping in the bass range, it will often be appropriate to perform a slight but progressive attenuation of the overlapping frequencies, to be carried out by means of a Low-Shelving-Eq set between 150 and 300 Hz, with an attenuation slope of 6 dB oct or even more.

The slope and frequency can be determined by ear, but it will largely depend on the actual extension of the bass part detected in the specific piece you are working on (for example, if the bass line is between D 74 Hz and B 124 Hz, it will be appropriate to choose an appropriately low cutoff frequency, so as not to leave a frequency range uncovered, creating a tonal "hole").

Using static EQ to reduce mutual masking between a piano (left) and a guitar (right). Both instruments were previously optimized using preliminary equalization (performed "upstream" with other equalizers), so only the subsequent demasking operations performed during the mix equalization are visible here. The musical parts of the two instruments were played simultaneously, and both in the medium range, so they tended to partially mask each other. For both, frequencies below 80 Hz and above 20 kHz were first cut, and frequencies below 300 Hz were slightly and progressively attenuated. Subsequently, the 1200 Hz range was boosted in the piano and attenuated in the guitar; similarly, the 4400 Hz range was boosted in the guitar and attenuated in the piano; the latter finally received a bright boost around 7 kHz. To complete the de-masking, it was decided to split the piano and guitar symmetrically to opposite stereo channels (40% L and 40% R).

Distribute sources of similar tonal range differently across the stereo front

For example, if we have a stereo synth pad, a piano, and an acoustic guitar accompanying a soloist playing together on the same frequency band, a typical solution is to assign each of these elements an opposite position on the stereo front.

For example, we could:

place the pad in the central position, opposing the pan-pot of the two L and R channels, with maximum opening at the 100% of L and R;
place the guitar on the 50% on the left channel and contrast the piano with the 50% on the right channel.

Such angular positions of the piano and guitar could become more drastic up to the 85-90% towards L or R (with further improvements in tonal matching), but in that case such sources would have to be abundantly reverberated in stereo to distribute the ambience of the “cornered” element over the entire stereo front.

Tone down and enhance tonal bands in a complementary manner

Let's take as an example a piano and a guitar playing an accompaniment line in parallel: in one of the aforementioned instruments it could be useful to enhance (for example) the medium-high tonal range and attenuate the mid-range, and then perform an exactly opposite operation with the other source, enhancing the mid-range and attenuating the medium-high one.

Using a multiband compressor

In place of or in addition to the action of the static EQ mentioned above, it is often preferable to operate with a dynamic EQ (in the form of a multiband compressor) to obtain a more effective result but without distorting the original sounds.

It is sufficient to identify the critical tonal range shared by multiple elements, so as to limit it when it exceeds a certain threshold in both elements, independently.

By using a multiband compressor, the tonal range can be attenuated only at the moments of its maximum expression and only to the extent necessary, without altering the perfect tonal balance obtained during the preliminary equalization of each element.

Carve out a tonal range for the soloist

This arrangement allows for more tonal space to be reserved for the soloist, keeping him more immersed in the mix without compromising his definition, which frees up a lot of space for other secondary elements.

Specifically, it involves attenuating, in the elements that disturb the soloist, the area that corresponds to his fundamentals and his first harmonics (200-1000 Hz, depending on the type of voice and individual cases), or to the "vocal formant" (around 2500 Hz).

When you hollow out an element on a mid-range to make room for another element, in many cases you will feel the need to compensate by boosting an adjacent range of the attenuated element.

De-masking treatment for a quartet (female vocals accompanied by acoustic guitar, electric bass, and drums). Before this treatment, the two tracks were subjected to an initial balancing process with the usual preliminary equalization process. On the guitar (left EQ), the bass below 75 Hz and the highs above 20 kHz were cut. The midrange with a wide Q (0.50) was also attenuated to free up space for the vocals (right EQ), which were boosted in the same range. On the vocals, the 350 Hz range, which was a bit muddy, was attenuated, also to make room for the guitar's natural, very pleasant bass, which emerged after the midrange was dug out. After this reduction, the guitar sounded a bit muffled, but this created a "special" magic together with the vocals. The vocals were also given brightness by boosting the 10 kHz range but cutting above 15 kHz, which is also the tonal range where the guitar received a boost that was needed to compensate for the attenuation of the mid-high range. This achieved an excellent tonal balance that allowed the vocals to sit well within the guitar without being overwhelmed even in the softest passages, facilitating dynamic control during the subsequent mastering.

Contain secondary sources with a side-chain compressor

This trick is particularly effective when the lead track of the side-chain compressor is the soloist of the song; it allows you to decrease the volume of the secondary elements that disturb the soloist, to create greater dynamic space for him only in the moments in which he is active.

This expedient will allow the soloist's average volume to be kept at a lower level, in turn freeing up tonal space for other elements.

To avoid the onset of a pumping effect on the attenuated element (that is, an excessively rapid rise in volume after compression), it will be necessary to contain the compressor's action within a range of about 2 dB (maximum 3) and measure the attack and release speeds so as to obtain maximum effectiveness without however highlighting the artificiality; to begin with, you can try with an attack of 50 ms and a release of 100 ms, and then try varying these values until you obtain the most natural result.

Using the side-chain trick with a multi-band compressor could further improve the effectiveness and limit the artifacts generated by compression, by setting the plugin to obtain, for example, a maximum attenuation of 3 dB in the specific tonal range involved (generally around 3000 Hz) and a smaller reduction (or no reduction) in the other tonal ranges.

Diversify the elements in depth

When a solo element is partially masked by a secondary element, you can also try to "remove" the latter from the on-face presence zone, using all the parameters already suggested for this result, that is: decrease the direct volume, increase the volume of echoes and reverbs (sometimes even exaggerating the stereo opening of the latter a little), decrease the high and low frequencies a little, sweeten the transients with a compressor and so on.

In this way, the element that has been penalized by the volume to make the other emerge will be able to stand out in "diffusion", thanks to the stereo reverb, leaving a little space of "presence" free.

Use interlocking arrangements

This is not a mixing gimmick, but I wanted to mention it to highlight an important concept: a well-written arrangement would require the musical parts to be written "interlockingly", that is, by appropriately interspersing the rhythm of the phrases and accents and using different ranges for the overlapping elements (for example, different octaves, according to the dictates of good orchestral ranking).

By proceeding in this way, the definition of the parts would be obtained from the very beginning using only the elements of "musical writing", since in this way the musical parts would always be clearly distinct without the need to "climb up the mirrors".

Unfortunately, these techniques are known and mastered only by composers, orchestrators, and arrangers with a high level of musical culture, and are therefore often lacking in pop music, where, alongside some very capable musicians, too many producers with little or no quality musical background try their hand.

However, it must be recognized that even in genres of popular origin (in well-made rock, for example) concepts, customs and conventions of "orchestration" have gradually established themselves such that the sound elements, in the best and most mature productions and with the evolution of the style, have achieved satisfactory "fitting" criteria that are functional to the specific expression of that musical genre, also allowing the expressive style to mature.

Dynamic interlocking

To achieve a lively and interesting mix, the dynamic expressions of the performances should be preserved to the maximum extent.

It must be said that dynamic expression should arise, at its core, from dynamically coherent performances rich in expressive accents, if possible conceived with an interlocking criterion.

For example, if the simultaneous piano and guitar parts, while both insisting on the middle register, were conceived with the aim of alternating (rather than overlapping) the performance accents, the dynamic accents of the two instruments would manifest themselves in different places, contributing significantly to the definition of the parts without using up too much space in the mix.

Unfortunately, in performances recorded with overdubbing, the expressive dynamics are often compromised due to the lack of interaction between the musicians, unless they are particularly experienced or guided by a good artistic director.

When we start mixing, we may find ourselves faced with tracks performed with good expressive dynamics or with dynamically “flat” tracks, to which it will be very difficult (if not impossible) to restore a hint of dynamic liveliness.

Respect for dynamics

The times of the "loudness war" are over, where people rushed to compress tracks, buses and masters in a "perverse" attempt to impose the volume of their songs on CD compilations and on the radio.

Nowadays, in the streaming era, this extreme trend no longer has much significance and, in any case, it will be appropriate to leave to the mastering, in addition to its other tasks, that of finalizing the loudness of the master in an appropriate manner.

However, the practice of compressing master files as much as possible persists among many operators, with results that I personally consider almost always deplorable.

The task of compression

Nowadays, compression is no longer required to "give volume" to the song within the audio medium, but it must respond to a few but much more important needs:

Modeling purpose

design the intensity ratios between the transient and the sustain of the sounds at will, strengthening or weakening one of the two with respect to the other; it goes without saying that this practice is not absolutely essential but can respond to specific creative needs aimed at making some performances softer or, on the contrary, more aggressive; another modeling effect of the compressor, as we will see, consists in compressing in an exaggerated manner the parallel clone of a track in order to be able to dose it to the original in order to be able to dose a harsher, more jagged and rich effect of the ambient colors of the recording brought to an "on face" level by this type of compression.

Leveling purpose

Better define the underlying musical elements, especially in very dense mixes; compression should assist the leveling of volumes already achieved through the management of the volume faders; this leveling can be achieved both by containing excessive peaks and by reinforcing weak sound emissions; however, in order to maintain maximum dynamic expressiveness, it is essential that these adjustments be kept to a minimum.

For best results, it is often preferable to do detailed work by carefully managing the volumes with the faders rather than overdoing compression.

A careful use of the anti-masking techniques mentioned above will often allow you to avoid or limit the use of leveling compression, while safeguarding the original dynamic expressiveness;

Adhesive purpose

create a tonal and dynamic bond between the various sources, determining a greater “sonic compactness” to be dosed in a way to be defined according to the musical genre being worked on;

In fact, each musical genre will require a greater or lesser degree of sonic bonding; this requirement is greatest in dance and hip hop genres, it is quite high in rock and pop genres in general, moderately high in modern expressive genres such as fusion and modern folk and jazz, and it is minimal or even non-existent in purist genres such as classical music and traditional jazz.

This gluing should be managed above all in the mastering phase but in certain cases it can be arranged in some groups of correlated tracks, in order to pre-determine in them a specific sound identity, functional to the "sound" of the song.

It is therefore not a question of "giving volume" to the song by means of compression, but rather of measuring its action in the various phases, in order to obtain a specific sound, which is functional to the context of the musical genre.

Any excess compression, in fact, will diminish the emotional impact of the performances, to the point of creating a flat and "boring" mix; this is why mastering must also be conducted in a way that ensures the loudness target required by the music industry is achieved without destroying dynamic expressiveness.

In certain cases (for example in the presence of performances that are too flat or recordings made with heavy compression at the input stage), as we will see, it will even be necessary to attempt an opposite process of expansion, in an attempt to create or recover a dimension of greater dynamic liveliness.

As you will see, this process will be almost impossible when applied to a mix, while it can often be successful in revitalizing a single track or a not too dense percussive ensemble.

An expander, adjusted to restore some liveliness to a less than thrilling congas and bongo performance. The dynamics of the performances were slightly dampened in the unaccented parts, just enough to create a more incisive portamento without revealing audible artifacts. With the threshold at 0 dB and the attack immediate, the entire dynamic range of the track was expanded; the attenuation was kept within 4 dB thanks to the Range control.

0 replies

Want to join the discussion?
Feel free to contribute!