AllYouNeedIsSound 2: A Guide to Audio Visualization and STFT

An abstract digital sound spectrum with smooth, flowing waves in gray and blue tones, featuring a clean and stylized equalizer on a white background. — A modern visualization of digital sound frequencies created with DALL·E.

In my last post, I showed how to load and visualize audio waveforms using Python. Now, let’s dive deeper into spectral analysis with Python, a powerful technique for understanding the frequency content of audio signals. By using this approach, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis.

What is Spectral Analysis?

Spectral analysis helps us break down an audio signal into its individual frequencies, making it easier to understand its components. For example, while a waveform shows amplitude over time, spectral analysis reveals the frequency components hidden within the sound.

Why is Spectral Analysis Important?

Frequencies are the building blocks of sound. Therefore, analysing them allows us to distinguish between different types of audio, such as a guitar note versus a drum beat. Additionally, this technique is crucial for tasks like music genre classification and speech recognition.

Key Concepts

Spectrogram

A spectrogram is a visual representation of how the frequencies in an audio signal change over time. It’s like a “heatmap” of sound, where:

The x-axis represents time.
The y-axis represents frequency.
The colour intensity represents amplitude (e.g., brighter colours mean louder frequencies).

Short-Time Fourier Transform (STFT)

The Short-Time Fourier Transform (STFT) is a mathematical tool used to create spectrograms. Unlike the standard Fourier Transform, which analyses the entire signal at once, the STFT breaks the audio into short, overlapping segments and applies the Fourier Transform to each segment. This allows us to see how frequencies evolve over time, making it ideal for analysing real-world audio, which is rarely steady like a pure tone.

A Teaser for Future Posts

While STFT-based spectrograms are powerful, they’re just the beginning. In future posts, we’ll explore advanced features like Mel spectrograms and MFCCs (Mel-Frequency Cepstral Coefficients), which are widely used in machine learning for audio classification.

Practical Example: Computing and Visualizing a Spectrogram with Python

Let’s put theory into practice. First, we’ll load an audio file using Librosa. Then, we’ll compute the STFT and visualize the spectrogram. Finally, we’ll interpret the results to understand the audio’s frequency content. Here’s a step-by-step guide:

0 Mount Google Drive

You can omit this step if you following my previous post.

```python
from google.colab import drive
drive.mount('/content/drive')
```

1 Load the Audio File

```python
import librosa
import librosa.display
import numpy as np  # Import numpy as np
import matplotlib.pyplot as plt

# Load an audio file
y, sr = librosa.load('/content/drive/path/to/your/audio.wav')
```

2 Compute the STFT and Convert to Decibels

```python
# Compute the STFT and convert to decibels
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_db = librosa.power_to_db(S, ref=np.max)  # Log-Mel Spectrogram
```

3 Plot the Spectrogram

```python
# Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Song Log-Mel Spectrogram')
plt.show()
```

Here’s what the output looks like after running the code with a sample audio file:

Log-Mel Spectrogram generated with Python for spectral analysis. — ***Figure 1:*** Example Log-Mel Spectrogram generated from an audio file using the code above. The x-axis represents time, and the y-axis shows frequency, giving a visual representation of the sound’s intensity over time.

Understanding the Output

Time (x-axis): Shows how the audio evolves over time.
Frequency (y-axis): Shows the range of frequencies present in the audio.
Colour Intensity: Represents amplitude (louder frequencies appear brighter).

For example:

Violin frequency analysis using Python and spectrograms — **Figure 2:** Example Log-Mel Spectrogram generated from an audio file using the code above on a sustained violin note. It appear as a horizontal line at a specific frequency.

Drum hit visualization with Python and spectral analysis — **Figure 3:** Example Log-Mel Spectrogram generated from an audio file using the code above on a drum hit. It appear as a vertical spike across multiple frequencies.

What Does the Spectrogram Tell Us?

Spectrograms provide a wealth of information that waveforms cannot. For instance:

Horizontal Lines: Indicate sustained tones, such as a violin note or a humming sound.
Vertical Spikes: Represent short, sharp sounds, like a drum hit or a clap.
Patterns: Repeated patterns in the spectrogram might correspond to musical rhythms or speech phonemes.

These features provide valuable insights into the structure and content of audio signals, making spectrograms invaluable for tasks like sound classification, speech recognition, and music analysis.

Reflection

Learning spectral analysis has been a transformative experience for me. It opened my eyes to the complexity of audio signals and deepened my appreciation for the mathematical tools that make audio processing possible. One of the challenges I faced was understanding how to choose the right window size for the STFT. Too short, and the frequency resolution suffers; too long, and the time resolution becomes blurry. Through experimentation and research, I learned to balance these trade-offs.

This journey has reinforced my belief that spectral analysis is not just a technical skill but a gateway to understanding the rich, hidden world of sound. As I continue to explore advanced techniques like CQT and HCQT, I’m excited to share my discoveries and challenges in future posts.

Conclusion

Spectral analysis is a powerful tool for unlocking the frequency content of audio signals. Additionally, by moving beyond waveforms and exploring spectrograms, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis. In this post, we’ve covered the basics of spectral analysis, introduced the Short-Time Fourier Transform (STFT), and demonstrated how to compute and visualize spectrograms using Python.

Additional Resources

Librosa Documentation: A comprehensive guide to the Librosa library.
Google Colab: A free, cloud-based environment for running Python code.
Freesound.org: A repository of free audio samples for experimentation.
Deep Learning 101 for Audio-based MIR, ISMIR 2024 Tutorial by Geoffroy Peeters et al. (2024).
Kinsler, L. E., Frey, A. R., Coppens, A. B., & Sanders, J. V. (2000). Fundamentos de Acústica (4ª ed.). Wiley.