IMPLEMENTATION OF IMAGE PROCESSING ALGORITHMS AND GENERALIZED LEARNING VECTOR QUANTIZATION TO TRACK AN OBJECT USING AR.DRONE CAMERA

In this research, Parrot AR.Drone as an Unmanned Aerial Vehicle (UAV) was used to track an object from above. Development of this system utilized some functions from OpenCV library and Robot Operating System (ROS). Techniques that were implemented in the system are image processing al-gorithm (Centroid-Contour Distance (CCD)), feature extraction algorithm (Principal Component Analysis (PCA)) and an artificial neural network algorithm (Generalized Learning Vector Quantization (GLVQ)). The final result of this research is a program for AR.Drone to track a moving object on the floor in fast response time that is under 1 second


Introduction
Unmanned Aerial Vehicle (UAV) applications have emerged to scientific research. Problems that found on those research are very broad in range from hardware design, aerodynamic calculation, object tracking module, localization, and coordination between UAVs. Now, the race between researcher to build autonomous UAV system has emerged [1].
Image processing techniques were used in many UAV's applications to make they have capability to recognize object. Object recognition algorithms were developed in many research to solve various problems. In [2] a system was designed to do tracking and landing on moving ground target yet, another task is to classify environment. In [3], they conducted research to classify vegetation species at farmland in Northern Australia. In [4], they used four classification techniques and two feature extraction techniques to do automatic vehicles detection. Purpose of the system they deve-lop is to replace patrol service that rely on pilot visual inspection. The weakness of that conventional system is the inspection was done in low altitude that would improve the risk. Moreover, that system sometimes caught by weather condition restriction.
Techniques that were used are vary depend on the purpose of the research. In [5], FAST corners detection and least median square estimation were used. Besides that, target's states are estimated by using Kalman Filter. They also used BRIEF descriptors to made target localization more robust. In [6], Han Yu and his colleagues developed a novel video tracking algorithm. That algorithm integrates differential evolutionary particle filter and color histogram character. In [7], Shafique and Sheikh considered elliptical object detection as their main concern. Their proposed algorithm which based on Line Following Technique (LFT) is used to recognize circle, elliptical and nearly elliptical objects. They also implemented Hough Transform to compare the results.
We conducted research on implementation of some computer vision and machine learning algorithms to make the U AV did the mission (task). The task was it had to follow a specific object (a cap) on the floor. We utilized some algorithms, such as gray-level slicing, Centroid-Contour Distance (CCD), Principal Component Analysis (PCA), Generalized Learning Vector Quantization (GLVQ) [8] and standard Proportional Controller [9]. We will describe those techniques more detail in Section 2.

Methods
In this research AR.Drone quadcopter and a computer with standard Wi-Fi interface were used. The drone consists of two cameras, some other sensors, four propellers and control board. It has carbon fiber support structure so it is quite lightweight. The battery can provide enough energy to make AR.Drone flying continuously up to ±10 minutes. When the drone power is being switched on, an ad-hoc Wi-Fi appears and a computer can connect to it. After the connection has established, the user may control the drone and get the information such as altitude, battery status and other sensors data. The AR.Drone has two body cover (called hull), outdoor and indoor hull. The AR. Drone with outdoor hull can be seen at Figure 1.
Because the AR.Drone has limited hardware performance (to run many processes quickly), all computation processes are running in the computer. So, the drone only know where to move from the information send by the computer via Wi-Fi while continuing to capture images continuously by using its camera. The AR.Drone has two cameras. The first camera is on the front side and the other camera is at the bottom. The bottom camera was used to implement object detection and tracking system in this research.
The system was built and run on Robot Operating System (ROS) environment and developed using C++ programming language. ROS is an open source meta-operating system designed to build the robot systems [10]. With ROS, we do not have to "reinvent the wheel" because it already has basic modules to develop the system. Besides, ROS uses the concept of peer-to-peer communication network between programs. So, some programs could run simultaneously to do the task.
In this research for example, we run some programs to do tracking process. Those programs are the driver of AR.Drone, object detection module and position control program. The driver for the drone we used was AR.Drone Brown package on ROS. Object detection module and position control module that we used were developed by ourselves. The overall system in one time running can be seen at Figure 2. That system run continuously over time, so each specific time, the system process images captured by AR.Drone and give coordinate message.
For basic image handler and processing, we used OpenCV (Open Computer Vision) library with ROS. That library is in vision_opencv stack which consist of several packages [11]. One of the most important package in that stack is cv_bridge that has task to bridge between ROS image message and OpenCV data structure.
We will give brief description about some methods we used. First of all, we used standard grayscaling from raw images and then produced binary image from the grayscale image with fixed threshold. After that we process the binary image with Centroid-Contour Distance (CCD) algorithm. This is used for getting features from an image base on object's contour. The features are obtained by measuring the distance from a fixed point to surrounding points at the object contour. Wang et. al. used this technique to get features from leaf image which is a single object (for image detection/recognition system) [12]. We used this technique because our targeted object (the cap) recognition is near characteristic with leaf recognition. Figure 3 illustrades CCD algorithm. We will described more about the implementation of this algorithm in Section 4.
After we got CCD features, we used the popular Principal Component Analysis (PCA) algorithm to further reduce the features. PCA is a technique that consist of statistics and linear algebra concepts, especially those are related to dominant Eigen vectors [13]. Let we describe this algorithm in brief explanation.  First of all, we find averages (means) from the matrix data and then we calculate their covariance. After we got the covariance matrix, we decompose the Eigen values with their corresponding Eigen vectors from that matrix. The decomposition is as the following equation (1).
In that equation, is covariance matrix, ̅ is Eigen values and is Eigen vectors which correspond with the values. We modified methods from JAMA: A Java Matrix Package [17] to become functions in C / C++ in this process. After that, we sorted the Eigen values from biggest to smallest value so their corresponding Eigen vectors follow them. After that we discard some small Eigen values and vectors (from smallest to a number of that). Finally, we multiply the original data matrix with reduced Eigen vector matrix. From that operation, we get the final reduced data matrix. The flow of the PCA process can be seen at Figure 4.
Data that has reduced by PCA process is used for neural network training/testing process using Generalized Learning Vector Quantization (GLVQ) algorithm. The GLVQ algorithm was developed by A. Sato and Yamada in 1995 [8]. This algorithm is variation of Learning Vector Quantization (LVQ) which assured convergence even though the data is not sorted. This is happening because the method is based on cost function minimization, i.e. miss-classification error with optimization of gradient descent [15].
The process is started by calculating missclassification error using equation (2).
In this formula, 1 and 2 is the distance of winner and runner-up class. After we get the error, we use sigmoid function as cost function. The func-tion use that error as one of its parameters. The other parameter is the iteration ( ). The function is as the following equation (3).
Then, the sigmoid function is differentiated to get "gain factor" for weight update process using equation (4).
From equation (4), the weight of winner and runner up in neural network's neurons can be updated using the equation (5) and equation (6).
In that formula, is the data and is learning rate. Equation (5) and (6) is for training process. For testing process, the LVQ family algorithm only used distance calculation formula to find Euclidean distance.
In tracking process, we used the proportional controller to control the velocity. This controller is used because sometimes, drastic changes of fixed motor controller does not produce a smooth control behavior [9]. The formula of proportional controller is as given by equation (7).  ( ) = ( ( ) − ( )) (7) In that formula (that always run over time ), is the motor output, is desired velocity and is actual velocity. The most important component of this system is , constant value / "controller gain". This value will decide the controller behavior. The schematic of P controller can be seen in Figure 5.
The difference between desired and actual velocity is the error that want to control. That error than multiplied by to produce the output. The bigger the value of , the response will be faster. However, that value should still be restricted because if it is too big, it will cause oscillation on the system [9].

Data Preparation
After literature study, data preparation consist of images which representing the class of cap and non-cap. We captured the images manually by the AR.Drone bottom camera. There are some criteria that must be concerned in capturing the images because they will decide whether the data good or not. Those criteria are: 1) The cap (object) should be contrasted with the background; 2) The combi-nation of hat orientation must be considered; 3) The distance between cap and the AR.Drone camera must be 1 meter. Because in the experiment the drone set to only fly 1 meter above the object.
One of the raw image we got by the size of 176 × 144 pixels can be seen at Figure 6. That image then cut to become 70 × 70 pixels because with that size in 1 meter distance, the cap is fully shows in a good proportion. So, the image of this size will be used in the experiment. Some cut images can be seen at Figure 7.

System Development Process
After we got the data, we developed the system to do object detection and tracking. The big picture of the development process can be seen at Figure  8. In the development process, we divided the process into two big steps, without ROS (we called Stage 1) and with ROS (we called Stage 2). We will describe the process step by step.
The first thing we did before building the code is preparing the environment and library (ROS and OpenCV). After the installation step was successfully done, we develop the training program which consists of pre-processing module and neural network module. In pre-processing module we used image processing techniques and feature extraction algorithm (PCA) that already described in Section 2. In neural networks module, we implemented GLVQ training formula that also described in Section 2.
In fact, the goal of training program is to get sorted eigen vectors and weights of neural netwo-   rk neurons. So we described this from the first step to the end. First of all, the program read the data as folders of 70 × 70 pixels images that already divided into classes (cap and others). The images from that folder are loaded one by one by the function from OpenCV, cvLoadImage. By using this function, we chose the data we want to use, either BGR (blue-green-red) or grayscale [16]. In this implementation, we used grayscale image because later technique we used (CCD) is consider about the shape and not the colors. After the grayscale image has gotten, the next step is convert it into two values only, 0 and 255. To do that, technique named gray-level slicing (simple thresholding technique for image) is used. With this technique, some range of grayscale value become low value and the others become high value [14]. We considered the low value is 0 (totally black) and the high value is 255 (white).
After the image only has two values, it then directed to Centroid-Contour Distance (CCD) process. As we already described in Section 2, this method is calculating distance of points in contour of object to a pre-decided fixed point. In this process we decided the fixed point is the coordinate of 35 × 35 in pixel of image. Then, we decided to get the distance for every 1 degree, so we will get 360 feature in full circle. The illustration of the features can be seen at Figure 3. The process from image reading to CCD run continuously until all images are processed. For every image processing steps, the data with label (class) is stored in matrix data.
After we got labeled data that has features from CCD process, we further reduced the feature with PCA. We did this because we want the run time in one cycle process as short as possible at real-time run using AR.Drone. The eigen vectors reduce the feature in this process and also stored in a file named eig_vec.csv to reduce feature at testing and real-time run later.
After PCA process, the data is processed with GLVQ algorithm. This is the last step in the training program. The architecture of neuron for this process can be seen at Figure 10. From the process, we will get weights of neurons and store those values in a file named bobot.csv that will be used in testing and real-time run.
As we seen at Figure 8, after we finished with training program, we went to program validation. This validation program will produce the output as percentage of accuracy. Process from reading the data to CCD process is same with training program. In PCA step, the data matrix is directly multiply by Eigen vectors from eig_vec.csv. In GLVQ step, the features are directly compared with the weights from bobot.csv by using euclidean distance. For every true classification, the "hit" then add by 1. After the data finished processed with neural network testing step, the hit then divided by the number of data. That operation will produce accuracy. If the accuracy is not really satisfying, we modified some parameters in training   After we already satisfied with the accuracy, we built/modified object detection program. This program reads the image of 176 × 144 pixels size instead of folders of 70 × 70 pixels images. The output is object coordinate in that pixel area if an object has been detected. The most important thing about this program is this program must run in reasonable time, which is under 1 second. It is because this program will modified to real time object detection program using ROS. Real time robot application must be fast enough to response, so we should not move to the next step of development if the running time still not satisfying.
To fulfill the requirement regarding running time, we propose to only check some grids of 70 × 70 pixel size in full 176 × 144 pixel size. Because of that, we named this step as "Grid Check". The rule is we only chose 19 grids to be checked. Illustration of Grid Check can be seen at Figure 12. We divided the grids from level 0 to level 2. It checked from center of image to its surrounding area. In the experiment using AR.Drone, it showed great performance that allows the drone response quickly in reasonable movements. The overall of flow program without ROS can be seen at Figure 9.
In the implementation using ROS, the first thing we must do in this step is modifying the previous object detection module. This module must receive image message from AR.Drone that is ardorne/image_raw. It is a bit different with previous program which get offline image from the local storage. Besides that, some functions and variables must be changed or modified. Actually, this task is not really hard if the programmer already keep up with ROS.
The last task in development process was modifying the AR.Drone control program. This program must get object coordinate message from real time object detection program. Since the object coordinate is in pixel coordinate, in control program, coordinate conversion must be done first. The coordinate conversion use the following equation (8) and equation (9).
In that equations, and is the center of 176 × 144 pixel, i.e. 88 and 72. To see this more clearly, we should refer to Figure 13 and 14. After we got the velocity, we used proportional controller to make a smooth movement. From this point, we said that the development process was done. The overall flow of program without ROS can be seen at Figure 11.

Results and Analysis
In this section, we showed the experimental result we did. In the experiment, we used packet of training and testing data (in image folders). Training data images can be seen at Figure 15 and the testing data can be seen at Figure 16.
The experiment result can be seen at Table 1. In that table, we conclude that the experiment produce good results except for PCA which remains 1 feature. Because the parameter of PCA in 200 features is good and run quickly, we choose that for real time object detection. For the result in real time detection using ROS, we can see the experiment and screen shot at Figure 17 and 18.

Conclusion
Experiment results show that the implementation of image processing and neural network GLVQ algorithm to do object detection and tracking was success. The running time is short and it have good accuracy in GLVQ validation that is more than 90 percent in average. The running time, which is under 1 millisecond, that shows at Table 1, is very good for real time application that need quick response. This short time is also coming from great contribution of feature extraction from CCD technique after doing some image processing techniques which are grayscaling and thresholding. The     feature data reduction using PCA and also Grid Check techniques also give contribution for this. In the future works, the real time running must be done in various heights and in the outdoor. Besides that, the object recognition must be develop to track or detect more objects. The development of swarm robots is also one of the greatest challenge in the future. Hopefully those works can be done in the near future.