AllYouNeedIsSound 1: Audio Digital Analysis with Python

A frequency spectrum represented as smooth waves in grey and light blue tones, with a stylized graphic equalizer on a white background. — A minimalist representation of audio frequency analysis created with DALL·E.

It’s been a while since my last post, as I’ve been deeply immersed in my work and the development of my latest project, PureWaveShaper. Recently, I’ve been exploring how researchers and innovators use audio digital analysis with Python to analyse and understand audio data. Whether you’re a musician, audio engineer, data scientist, or simply curious about sound, this post will introduce you to the fascinating world of audio data analysis.

What is Audio Digital Analysis?

Audio digital analysis involves the study of sound signals to extract meaningful information. Sound, at its core, is a wave of pressure variations travelling through air. In the analog domain, these variations are represented as changes in voltage. When digitized, sound becomes a discrete sequence of amplitude values over time. Analysing properties such as frequency, amplitude, and variation allows us to create new visual and data representations. This is invaluable for tasks like sound classification, identifying the genre of a song, detecting pitch, recognizing speech patterns, or monitoring environmental sounds.

Why Python?

I highly recommend Python is favoured for audio analysis due to its:

Ease of Use: Clean syntax and readability make it accessible for beginners and efficient for rapid prototyping.
Rich Ecosystem: Libraries like Librosa, NumPy, and Matplotlib simplify loading, processing, and visualizing audio data.
Community Support: A large, active community ensures ample resources and troubleshooting help.
Integration with AI: Leading language for machine learning and AI, ideal for advanced audio analysis pipelines.
Cost: Free and open-source, unlike proprietary tools like MATLAB.

Why No Other Languages?

Language	Strengths	Weaknesses
MATLAB	Academic research, prototyping	Proprietary, expensive
C++	Real-time processing, performance	Steep learning curve
R	Statistical analysis	Limited to specific research
JavaScript	Web-based applications	Limited to browser environments
Julia	High-performance computing	Still gaining traction
Rust	Real-time, low-latency applications	Modern, safe, performant

Getting Started: Loading and Visualizing Audio

Setting Up Your Environment

Before diving into the code, you'll need a place to run Python. Here are a few options:

Google Colab: Free, cloud-based Jupyter notebook environment.(Recomended)

Local Python Installation: Install Python and use an IDE like Jupyter Notebook, VS Code, or PyCharm.(Recomended for advance users)

Anaconda Distribution: Beginner-friendly, includes Python and many scientific libraries.

Loading and Visualizing Audio

Let’s start with the basics—loading an audio file and visualizing its waveform. A waveform shows how the amplitude of the sound changes over time, giving us our first glimpse into the sound’s structure.

import librosa
import librosa.display
import matplotlib.pyplot as plt
from google.colab import drive

# Mount Google Drive to access your files
drive.mount('/content/drive')

# Load the audio file from Google Drive
y, sr = librosa.load('/content/drive/path/to/your/audio.wav')

# Display the waveform
plt.figure(figsize=(14, 5))
librosa.display.waveshow(y, sr=sr)
plt.title('Waveform of the Audio File')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

In this code:

from google.colab import drive and `drive.mount('/content/drive') connect your Google Drive to Colab. When you run this, Colab will prompt you to authenticate by providing a link and asking for an authorization code. Follow the steps to grant access.
librosa.load() loads the audio file. y is the audio time series, and sr is the sampling rate. Replace /content/drive/path/to/your/audio.wav with the actual path to your audio file in Google Drive (e.g., /content/drive/My Drive/audio_files/sample.wav).
librosa.display.waveshow() creates a visual representation of the waveform.
matplotlib.pyplot helps us display the plot.

If you don’t have an audio file in Drive yet, you can upload one to your Google Drive (e.g., a .wav or .mp3 file) or download free samples from sites like Freesound.org and then upload them. Alternatively, if you’re not using Drive, you can manually upload a file to Colab by clicking the folder icon on the left and selecting "Upload," then adjust the file path to /content/your_audio.wav.

If you’re using a local installation instead of Colab, simply replace the file path with the location of your audio file on your computer (e.g., C:/Users/YourName/audio.wav).

Here’s what the output looks like after running the code with a sample audio file:

Waveform of audio digital analysis with Python, showing amplitude over time. — Figure 1: Waveform of an audio file: Amplitude variations over a duration of 4 minutes and 10 seconds.

Reflection

Writing this post has been a good exercise in simplifying complex ideas and concepts, all while trying to put them into simple words without losing their essence. At first, I underestimated the challenge of choosing the right starting point: Should I dive straight into spectrograms or build from the ground up with waveforms? I chose the latter, understanding that grasping the raw signal would be fundamental for any research as well as for learning new tools. Using Librosa was a deliberate choice after exploring other tools like scipy.signal; its focus on music and audio analysis aligns with my goal of classifying audio features. This process also reminded me of the accessibility challenges in research—connecting Google Drive to Colab, for instance, took trial and error to streamline for others. As I move forward, I hope to see how these basic techniques evolve in the machine learning models I’m developing, which I may not be able to share immediately but hope to do so in the near future. I also hope this blog documents some of my progress and challenges and, above all, helps me organize my ideas better.

Conclusion

This is just the beginning of a journey into digital audio analysis with Python. Today, we learned what audio analysis is, why Python is a great tool for it, and how to visualize a basic waveform. In future posts, I will dive deeper into visualizing waveform features such as the spectrum.

Additional Resources

Librosa Documentation
Freesound.org
Google Colab
Deep Learning 101 for Audio-based MIR, ISMIR 2024 Tutorial by Geoffroy Peeters et al. (2024).