I. Introduction
Human motion analysis has diverse applications in medicine, healthcare, rehabilitation, game engineering, surveillance, search and rescue, and defense. Human motion analysis can be used for the diagnosis of motion-related diseases, such as cumulative trauma disorders, psychosomatic disorders, and autism spectrum disorders [1]. Energy expenditure can be estimated by the class of human motion [2]. In addition, human gait analysis is essential for the evaluation of the degree of rehabilitation. Radar offers a unique opportunity for monitoring human motion remotely. In particular, micro-Doppler signatures produced by human limb motion contain information pertaining to such motion. Because micro-Doppler signatures are represented as a spectrogram in the form of an image, human motions can be recognized through analysis of spectrogram images.
Due to the advancement of deep learning, the image recognition/classification problem can be effectively addressed by the use of deep convolutional neural networks (DCNN). To train a DCNN effectively to achieve high classification accuracy requires a large amount of image data. In the case of radar data, such an effort is challenging due to a lack of historical records as well as the high costs of collecting a large data set. Therefore, it is necessary to augment the radar data set to fully explore the capability of a DCNN. Recently, generative adversarial networks (GANs) have been successfully used to address the radar data augmentation problem [3].
A GAN is a machine learning algorithm designed to produce large amounts of synthesized data that have similar distributions to that of the original data. Owing to this capability, GANs have many applications, such as image synthesis, image de-noising, and image-to-image translation. A GAN consists of two networks, a generative network and a discriminative network, that compete against each other during the training process. The generative network generates synthesized images, and the discriminative network evaluates the generated images. During training, the cost function is defined such that the generative network decreases the classification rate of the discriminative network, while the discriminative network is trained to increase the classification accuracy. Over the course of training, each of the networks contributes to improve the appearance of generated images [4].
This tutorial will describe the process of setting up environments for GANs through the installation of the GPU driver, cuDNN library, CUDA library, and Anaconda along with the training of GANs using a measured data set. Finally, we will apply this approach to augment a human motion data set measured by Doppler radar to investigate whether the augmented data are effective in the training of a DCNN.
II. Methods
In this study, the data set included 7 activities that were recorded using 12 human subjects for 12 iterations; the total number of data points was 1,008. Figure 1 shows the measurement setup. The 7 activities included boxing while moving forward, boxing while standing in place, crawling, running, sitting still, walking, and walking low while holding a stick. The data were organized in a MATLAB .matfile named Seven_activity. The .mat file has a structure file named activity. The structure has three fields. The first field, name, is a string containing the activity name; the second field, human_number, is a numerical number with a datatype double; and the third field, data, has a matrix sized 600 × 140.
The GAN we designed consists of two neural networks. The generative network takes an input of a noise vector and tries to produce a synthesized image, while the discriminative network tries to classify the data correctly as synthesized or real data. To train networks, an Adam optimizer is employed to reduce the loss function. The loss function is defined using the error from the discriminative network. For the activation, a sigmoid function is used. To initialize the weight and biases in the neural networks, Xavier initialization is used. The steps and processes are described in below.
This tutorial was completed on Windows 7 with an i7 CPU and an NVIDIA GTX 770 GPU. Since Anaconda and GPU drivers are available on MacOS and Ubuntu, this should be easy to replicate on any OS and any NVIDIA graphic card with cuDNN support and the same software as that used in our work. Training a GAN requires significant memory space to complete the process; a GPU is preferred because of the large amount of memory available. To use the GPU, the right drivers must be installed correctly, starting with the graphic card driver and followed by the CUDA library and the cuDNN library. To download the CUDA library, go to website (https://developer.nvidia.com/cuda-toolkit), click on ‘Download now’, then choose the appropriate operating system and follow the installation instructions [5]. To download the cuDNN library, go to website (https://developer.nvidia.com/rdp/form/cudnn-download-survey). A membership must be created to download the file. It is necessary to download the latest library with the appropriate CUDA version and then follow the installation process [6]. In a Windows environment, the following path variable must be added:
The procedure to set the path variable is shown in Figure 2. To use TensorFlow, Anaconda version 4.3 with Python 2 must be installed. This Anaconda version can be found at https://repo.continuum.io/archive/. Download ‘Anaconda2-4.3.1-Windows-x86.exe’ for a 32-bit system or ‘Anaconda2-4.3.1-Windows-x86_64.exe’ for a 64-bit system. Then follow the installation process. After installation of Anaconda, the following path variable must be added:
After installing Anaconda and setting the path variable, open the Anaconda Prompt and create an environment using the following command:
The name of the environment in this case is env_name. The Python version must be set to 3.5 because this is what TensorFlow uses. The environment can be activated using the following command line:
Figure 3A shows a screenshot of an example for creating and activating an environment. The following packages must then be installed—Keras-GPU, SciPy, Pillow, OpenCV, Matplotlib, and Git—in the env_name environment [789101112]. These packages can easily be installed by typing the following command lines:
Each line must be executed in sequence. See Figure 3B for an example of a package installation using Anaconda Prompt.
Once everything is installed, the code must be downloaded to start the training process. The recommendation is to create a folder into which the code can be saved. Next, the folder should be set as the current directory by using the command cd. The code can be downloaded by using the following command [13]:
Open read.py to edit the path to the data set in line:
The size of the data used in this study was 600 × 140, but if the size of data used is different, this can be modified in read.py as in the following lines:
The number of data points in the data set must be set in read.py in the following line:
The function read.py has three objectives, namely, reading the data, preparing the data for GANs training, and visualization of the data. The function reads a .mat file and resizes the images to 64 × 64 to input to the GANs, while the size of the input image can be any in the .mat file. For example, the original data size in our case was 600 × 140. Once the codes are saved, to run the code, run the following command:
Running the above command only produces a GAN image for boxing while moving forward. To change the activity, open GAN_train.py using Notepad, find the line below to change the activity, activity = ‘boxingmoving’, and change the name between single quotation marks. The code in GAN_train.py initiates the training process of GANs. In the code, the directory of the input data and output data from GANs is determined. Figure 3C shows an example of running the code and the output lines.
III. Results
After visual inspection, augmented images were produced at 2,700 epochs. Images of the original data are shown in Figure 4, and Figure 5 presents the augmented images. As seen in Figures 4 and 5, the synthesized images from the GANs show a similar distribution to that of the original images. With the combination of original data and synthesized data, the DCNN is designed and trained. The number of layers of the DCNN structure is selected heuristically until the classification accuracy becomes saturated. The DCNN we designed has 6 layers including 3 convolutional layers and 3 fully connected layers. The numbers of filters in the convolutional layers are 16, 32, and 64, while the numbers of nodes in the fully connected layers are 124, 124 and 7. In the convolutional layer, batch normalization, a rectified linear unit, and max pooling are employed. The convolutional filter size is 2 × 2. We have only considered the motion classification accuracy of original data because the classification of synthesized data is meaningless even though they are used in the training process. The results reveal that the use of GANs can improve the recognition of human motion from 90% to 94% when the same DCNN structure is used.
IV. Discussion
This paper presented the overall process of preparing an environment for GANs and training them. In particular, we have presented an example of augmenting micro-Doppler radar data of human motion measured by Doppler radar. Owing to the augmented data set, deeper neural networks can be constructed and effectively trained, resulting in better classification accuracy. This preliminary research on the automatic recognition of human motion has the potential to contribute to diverse applications in healthcare and rehabilitation, such as human gait analysis or energy expenditure estimation.
It should be noted that the current use of GANs presents challenges as it is an emerging and advancing technology. First, no standard currently exists to evaluate the quality of GANs outputs. The number of epochs should be determined by visual inspection, which can be subjective. Therefore, it is not easy to quantify the success of GANs training. Second, GANs occasionally have a mode-collapsing issue that limits the production of outputs with diverse characteristics. In addition, improperly trained GANs produce only very similar images. These issues should be addressed in the future to enable the wider use of GANs in radar image processing.