January 1, 2000

HandMusic - 基于手势追踪的无接触BGM控制

Introduction:

HandMusic可以通过webcam实现手势追踪，实现非触控的音乐切换和混合。

This is a simple demo work that uses handtrack.js and Tone.js; by capturing the position of the hand, it can switch the background music and change the simple filter of the canvas; similarly, a simple drums created loops by Tone.js, you can simply mix different sounds together.

You can try my online demo:

CodeSandbox: https://ggddh.csb.app

You can also watch the demo video online:

YouTube: https://youtu.be/8Pg5hY85aWA

You can find this project in my code repository:

Github: https://github.com/ShuSQ/CCI_ACC_FP2_handMusic

0.Why did I do this project? 🧐

When I was a high school student, I tried audio production software like fruitloops, and I was very interested in audio production methods. At that time, I thought about whether users can control the audio through some interactive methods of the body. I have come into contact with TensorFlow this semester, and I decided to explore and experiment in this direction.

仪器型号

Image from：34.4.magnusson.pdf

1.Handtrack 🖐🏼

First, I need to find a way to recognize the hand and control it through the video information collected by the webcam. Among the available technical tools, we have many choices, such as TensorFlow, openCV and MediaPipe, etc. Finally, I chose Handtrack.js as the technical solution for finding hands.

What is Handtrack.js?

Handtrack.js is a library for prototyping realtime hand detection (bounding box), directly in the browser. It frames handtracking as an object detection problem, and uses a trained convolutional neural network to predict bounding boxes for the location of hands in an image.

Why I choose Handtrack.js?

Because I am more familiar with JavaScript, and it is easier to obtain the case-study of Handtrack.js on the Internet; of course, in terms of calculation speed and accuracy, it is not as good as the Python+MediaPipe solution. Considering the scale of the project, the current Performance is also acceptable.

Other, we can also detect the image information collected by the webcam through JavaScript, compare each frame, and then detect the motion information. This is a clever method, and it can also generate its own style of video.

You can learn more from this code repository here:

https://github.com/jasonmayes/JS-Motion-Detection

测试图片

Testing the technical solutions of tensorflow.js and fingerpose is great, but we don’t need it at the moment

Handtrack.js is very convenient to use, we can quickly configure parameters:

const modelParams = { // 導入handtrack默認的參數模型
  flipHorizontal: true, // flip e.g for video
  imageScaleFactor: 0.7, // reduce input image size for gains in speed.
  maxNumBoxes: 1, // maximum number of boxes to detect
  iouThreshold: 0.5, // ioU threshold for non-max suppression
  scoreThreshold: 0.9, // confidence threshold for predictions.
};

I learned a lot from these materials:

2.Binding with audio player

Create an empty audio tag in .html:

1	<audio></audio>

We need to get the page elements in handplay.js:

1	const audio = document.getElementsByTagName("audio")[0];

Through the length of predictions, bbox is the coordinate of the box that stores the recognition result. We can read the xy coordinate information of the handbook through bbox[0] and bbox[1], and then divide it into 3 recognition areas:

if (predictions.length > 0) {
  let x = predictions[0].bbox[0];
  let y = predictions[0].bbox[1];

  // 匹配位置並播放音源
  if (y <= 100) {
    if (x <= 150) {
      //Left
      audio.src = genres.left.source;
      applyFilter(genres.left.filter);
    } else if (x >= 250 && x <= 350) {
      //Middle
      audio.src = genres.middle.source;
      applyFilter(genres.middle.filter);
    } else if (x >= 450) {
      //Right
      audio.src = genres.right.source;
      applyFilter(genres.right.filter);
    }
    //Play the sound
    audio.volume = 1;
    audio.play();	// 播放audio
  }
}

The following articles are very helpful for me:

3.Add simple filter effects

In addition to switching music with gestures, I also considered adding simple filters to change the effect of the video, mainly through the filter attribute of CSS, using the three effects of blur brightness contrast, and initially considered implementing similar to TensorFlow The effect of NST (Neural Style Transfer), considering the performance of the browser and the implementation cost, chose a more convenient filter solution.

4.Create loops by Tone.js

Tone.js is a Web Audio framework for creating interactive music in the browser. Earlier, Tone.js and magenta.js were compared. There are many similarities between the two, and magenta.js is okay. Through MusicRNN and MusicVAE to complete machine learning, I really want to do this, let the ambient sound, background sound and Samples mix! However, there are not enough materials for making music with Magenta.js on the Internet. After learning, I found that I could not achieve it in a short time, so I chose Tone.js. To achieve this, Tone.js loads audio samples and sets the scheduleRepeat() poetry selection loop effect, and sets the playback step through the <input type="checkbox"> tag.

// .HTML
...
<div class="kick">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
    <input type="checkbox">
</div>
...

function sequencer() { //載入音頻
  ...
	const kick = new Tone.Player('src/kick-electro01.wav').toMaster();
	...
}

function repeat() {	// 實現單個drums的播放
  ...
  let kickInputs = document.querySelector(
    `.kick input:nth-child(${step + 1})`
  ...
  );
  
  ...
  if (kickInputs.checked) {
    kick.start();
  ...
  }

In addition, because the browser prohibits the automatic playback of audio, we need to add resume() to skip this:

document.documentElement.addEventListener(
  "mousedown", function(){
    mouse_IsDown = true;
    if (Tone.context.state !== 'running') {
    Tone.context.resume();
  }})

I learned a lot from these materials:

5. Can do more

After the whole demo is finished, I have a better understanding of webaudio and webcam, but I also have more things to enrich, such as:

Rich musicmachine GUI elements, you can control the step and volume, and choose more drums types;
Consider using a track library with better performance to achieve more detailed interaction, such as using fingers to play audio sources and click checkboxes;
Do more experiments on the filter effect, and have parameter conduction with the mixed audio;
Use MusicRNN, MusicVAE, etc. to convolve the audio, you can mix the audio through more modes.
Consider tracking targets other than hands, such as facemesh, etc.

About this Post

This post is written by Siqi Shu, licensed under CC BY-NC 4.0.