Thomas Ortega - Portfolio

Thomas Ortega - Portfolio

EN ES

Audio Programming For Motion Tracking Game

Audio Programming For Motion Tracking Game
C# Unity

Hello there! I wanted to share some experiences from my three month-long internship at Turku Game Lab in Finland. I had the chance to work in a project that I found quite interesting, it combined motion tracking with audio programming. As someone who's both passionate about programming and music, this was a great opportunity to dive into audio programming, something I'd been wanting to try for a while.

What Was the Project About?

The game we were working on was something that could come from the WarioWare series. In a circus-themed scene, players would act as a clown, striking poses in time with the music. We used some kind of Kinect-style cameras for motion tracking.

game play of motion tracking showing a player in sync with the avatar while a cut-out approaches

The basic idea was this: As music played, these cot-out shapes would move towards the player's on-screen avatar. The player had to match the pose of the approaching cut-out and hold for about half a second. If they got it, the cut.out would speed up and clear out, making way for the next one. The idea was setting up a continuous, rhythmic flow of movement.

How I Got Involved with Audio Programming

Initially, the game lacked a rhythmic element. The team had successfully implemented motion tracking and collision detection for the cut-outs. During a project review our tutor came to test the game, discuss technical challenges, and brainstrom new ideas.

At this time, I had recently completed a Unity course to familiarize myself with game programming and the Unity Editor. This course was key in introducing me to common game development concepts and encouraged experimentation by adding features to the mini-games completed in each section.

While my colleagues were brainstorming ideas just a few steps away from me, I was already tinkering with audio programming and following some tutorials that I will mention later. When I overheard them discussing the need for music to enhance the game, I realized I could contribute. I saw an opportunity to apply what I knew about audio and explore this area further before my final vocational training project. I approached the team and explained that I had been working on some scripts that, with some refinement, could provide the rhythmic element they were looking for.

I ended assigned to the project and created two main scripts:

1. A beat synchronization script.

The goal of this script was timing events with the music. Given a BPM (beats per minute) and an audio source, it could generate events that synched up with specific beats. Here's how it worked:

  • The script took the BPM of the song as input.
  • It calculated the time between beats (60 seconds / BPM).
  • Using Unity's AudioSource.timeSamples property, it tracked the current playback position.
  • By dividing the current playback position by the samples per beat, it could determine when a beat occurred.
  • The script allowed for different subdivisions of the beat (quarter notes, eighth notes, etc.) by using multipliers.
  • Events could be triggered on specific beats or subdivisions, allowing for precise timing of game elements.

In our game, we used this to release the cut-out shapes into the scene at specific musical intervals. It added that rhythmic element the team was looking for, making the gameplay feel more connected to the music.

Improvements on the beat tracking script

As I worked with the script, I realized two key improvements were needed:

a) Start Delay: Many recordings don't begin precisely on the first beat, To account for this, I implemented a start delay feature using a coroutine:

[SerializeField] private float _startDelay = 0f;
private bool _isReadyToTrackBeats = false;

private void Start()
{
    StartCoroutine(StartAfterDelay(_startDelay));
}

private IEnumerator StartAfterDelay(float delay)
{
    yield return new WaitForSeconds(delay);
    _isReadyToTrackBeats = true;
}

This allows us to precisely synchronize the beat tracking with the actual start of the rhythmic elements in the music, even if there's an intro or lead-in.

b) Phase Offset: The original script triggered beats at the end of each rhythmic division. To provide more flexibility and accurate synchronization, I implemented a phase offset:

[SerializeField] private float _phaseOffset = 0f;

public void CheckForNewPulse(float pulse)
{
    int currentPulse = Mathf.FloorToInt(pulse + _phaseOffset);
    if (currentPulse != _lastPulse) 
    {
        _lastPulse = currentPulse;
        _trigger.Invoke();
    }
}

This offset allows for precise control over when the beat triggers occur. For example, in a 4/4 time signature, setting the offset to 0.25 would shift the trigger to the third beat of the measure, while 0.5 would move it to the second beat. This feature opened up creative possibilities, such as emphasizing beats 2 and 4 like it's done in popular music genres.

2. An audio visualization script.

This script analyzes the audio stream in real-time and using that data to drive visual elements. Here's the breakdown:

  • The script used Unity's AudioSource.GetSpectrumData() method to get frequency data from the audio.
  • This raw data was then processed into more manageable frequency bands (I used 8 bands for our purposes).
  • Each band represented a range of frequencies, from low (bass) to high (treble).
  • The script smoothed out the data to prevent jarring visual changes.
  • This processed data could then be used to drive various visual elements in the scene.

For testing, I set up a scene with 8 cubes. The height of each cube would change in response to its corresponding frequency band - the leftmost cube reacted to the lowest frequencies, while the rightmost responded to the highest frequencies.

Improvements to the visualization script

While working on this script, I realized that the way the audio bands reacted to different frequencies didn't match what I was used to seeing in music production equalizers. After some research and experimentation, I discovered that two key factors needed to be addressed: the logarithmic nature of our frequency perception and the tendency for higher frequencies to have lower amplitudes in typical audio signals.

To account for this, I modified the MakeFrequencyBands() method:

for (int i = 0; i < 8; i++)
{
    float average = 0;
    int sampleCount = (int)Mathf.Pow(2, i) * 2;

    if (i == 7)
        sampleCount += 2;

    for (int j = 0; j < sampleCount; j++)
    {
        // ... (channel selection code)
        average += sample * (count + 1);
        count++;
    }

    average /= count;
    _freqBand[i] = average * 10;
}

This modification addresses two key aspects:

  1. Logarithmic Frequency Distribution: The line int sampleCount = (int)Mathf.Pow(2, i) * 2; calculates the number of samples for each frequency band. This results in exponentially wider bands as the frequency increases, better matching our perception of sound according to the Weber-Fechner law.
  2. Amplitude Normalization: The line average += (_samplesLeft[count] + _samplesRight[count]) * (count + 1); (and its variants for different channels) applies a weighting factor to each sample. This factor increases with frequency, effectively boosting the amplitude of higher frequencies. This "normalization" compensates for the natural tendency of higher frequencies to have lower amplitudes in most audio signals, ensuring a more balanced visual representation across the frequency spectrum.

n the final game, we used this script to animate speaker cabinets in the background of the circus scene. The improved frequency analysis and normalization resulted in a much more engaging visual response, especially for higher frequencies. Since the music had prominent drum beats and a full frequency range, you could really see the speakers pulsing with the rhythm across all frequency bands, making the scene more dynamic and responsive to the audio.

Challenges and Learning Experiences

While working on these scripts, I encountered a few challenges:

  1. Precision in Beat Detection: Ensuring that the beat detection was accurate and synchronized was tricky. Relying on the main update thread of Unity might not give a consistent sync if the synchronization has to be maintained for long.
  2. Smooth Audio Visualization: Audio systems often times come equalizers you can tune, they usually show some bars with different frequencies, if the audio data is not post-processed the visualization will look too jarring to be useful.

Through overcoming these challenges, I gained a deeper understanding of how audio is handled in game environments, the importance of optimization in real-time applications, and the intricacies of creating responsive, music-driven gameplay elements.

Conclusions

The process of improving these scripts helped me solidify what I was learning, what could end up as a forgotten scripts after a project became instead a step in my endeavor to become a professional with an insight of audio programming or programming oriented to music applications.

To make this writing brief I avoided topics that appeared while testing the scripts and during my research, the approach of the scripts is quite naive, specially the beat synchronization, in general audio is managed in its own thread and this is true for Unity, to make a robust synchronization that can hold over time, one of the solutions is to create a clock capable of keeping the audio and gameplay in sync, however, this is not a task that was feasible to address for the development of this project, but if you're curious, there's a couple of open source rhythm games that apply this technique, they're open source but also super fun, I'm talking about StepMania and the Osu-framework.

Citations

  • You can check Peer's Play playlist for the basic implementation of the visualization script.
  • To understand how to track beats you can check b3agz' video.
  • You can read about the Weber-Fechner Law in Wikipedia, that's what I used as reference.

Lastly, any of this wouldn't be possible without my colleges Mika, Mario and Lucas, working with you guys was a pleasure and hopefully we'll work together in another opportunity. Also thanks to our tutors and seniors back at Turku Game Lab, for me this experience was an eye-opener.


PS: Repo is currently empty, but I plan to upload it soon.