MODIFICATION OF ALEXNET ARCHITECTURE FOR DETECTION OF CAR PARKING AVAILABILITY IN VIDEO CCTV

The difficulty of finding a parking space in public places, especially during peak hours is a problem experienced by drivers. To assist the driver in finding parking space availability, a system is needed to monitor parking availability. One study to detect the availability of parking lots utilizing CCTV. However, research on the availability of parking spaces on CCTV data has several problems, detecting parking slots that are done manually to be inefficient when applied to different parking lots. Also, research to detect the availability of parking lots using the Convolution Neural Network (CNN) method with existing architecture has many parameters. Therefore, this study proposes a system to detect the availability of car parking lots using You Only Look Once (YOLO) V3 for marking the parking space and proposed a new architecture CNN called Lite AlexNet which has few parameters than other methods to speed up the process of detecting parking space availability. The best accuracy of the marking stage using YOLO V3 is 92.31% where the weather was cloudy. For the proposed Lite AlexNet get the best time training average which is 7 second compare to other existing methods and the average accuracy in every condition is 92.33% better than other methods.


Introduction
Looking for a parking space in a mall or office is difficult especially during busy hours. For mall, the busy hour is at weekends around 6 -8 pm and companies are on weekdays around 8 am -5 pm. This problem costs time, money, and fuel as well as pollution and traffic congestion [1]. To solve this problem, research about the detection of the availability of parking space was conducted. When there is a parking space available, drivers will fight over it [2]- [4]. The use of ultrasonic sensors for detecting the availability of parking space has been widely used by both companies and malls. However, the sensor has a weakness where one sensor can only detect one parking space so the number of sensors must be equal to the total number of parking spaces in the parking lot and this will cost a lot. Therefore, Sensors can be replaced by CCTV placed in several parts of the parking lot. The advantage of CCTV is that CCTV can capture several parking spaces while sensors can only capture one parking space. By utilizing CCTV, researchers can apply object recognition using 48 Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), volume 13, issue 2, June 2020 deep learning to detect cars in several parking spaces captured by CCTV.
There have been many studies conducted related to the detection of the availability of car parking spaces using the image. A research conducted by [5] was proposed to detect the availability of parking space using the modified AlexNet by reducing the convolution layer and named it mini AlexNet that has higher accuracy in detecting parking space availability. However, the research did not explain how to mark the position of parking space. Another research was carried out by [6] who proposed a method for detecting parking space using moving drones. This research utilizes features such as vehicle colors, gradients, and Harris Corners of the captured image to detect the availability of parking spaces and then uses Support Vector Machine (SVM) to do the classification. Other research was conducted by [7], proposing a binary classifier CNN that can determine parking availability. This research conducts pre-processing on binary images and then classified using CNN. Based on research that has been done, many ways can be done to detect the availability of parking spaces. However, research on the availability of parking spaces using CCTV has several problems, that is to detect parking spaces that are done manually is inefficient when applied to different parking lots. Research to detect the availability of parking space using the Convolution Neural Network (CNN) with existing architecture is also inefficient because CNN has many parameters and layers. Therefore, this research proposes a system to automatically detect the availability of car parking spaces using YOLO V3 [8] and Lite AlexNet which is a modification of AlexNet architecture so it can detect objects quickly and accurately on image or video from CCTV located in parking lots. By utilizing the bounding box from YOLO V3 to detect objects, this system will take the property of the bounding box to detect all available parking locations by taking the bounding box from the cars that have been parked. All bounding box properties obtained from YOLO V3 are then used as markers to detect the availability of cars in the parking lot using the modified AlexNet by reducing the number of parameters and architecture and name it Lite AlexNet. AlexNet is a CNN with an architectural model that has the smallest number of layers and parameters compared to other architectural models such as VGG. AlexNet can classify 1000 classes of objects contained in images such as clothes, rulers, sandals, and others. The detection of the availability of parking space only has two classes, that is, available or occupied. Therefore, researchers use AlexNet architecture and modify its architecture. So that the architecture that was originally able to classify 1000 classes is now a binary classification that can classify only two classes by training the model from scratch [5]. By doing this research, this method is expected to optimize the detection of the availability of parking space quickly so that the driver can find a parking space optimally.
For this research, the researcher uses only one CCTV and focused on building and testing the model for detecting the availability of parking space This research paper is composed as follows; Section 2 presents the related works of this research. Section 3 discusses data and methods in this research. Section 4 shows the result of the experiment and the analysis of it. The conclusion is made in Section 5.

Related works
Research on the detection of the availability of parking spaces has been carried out, ranging from using sensors to cameras. This sub-chapter will explain some of the previous studies regarding the detection of empty parking with image or video sources In research conducted by [7], proposed a binary classifier for CNN which can determine whether the parking space is empty or not. First the researchers used data from the camera. The data obtained is an RGB image which is then converted to HSV to take the value of V, which is the value Figure.1. This is the system architecture of the detection the availability of the parking space. of the color. After that, the logarithm transformation is performed to transform the grey value to be wider and performed the Second derivative algorithm to sharpen the margins in the image. The results of this process are converted into binary images and put into CNN. The results of CNN can classify with 100% accuracy. However, this research didn't explain the CNN architecture and it also didn't explain how the system detects parking space is empty or not.
The next research about car detection is from [6]. In this study, a method is proposed to detect car parking through cameras on drones. First, the parking lot is detected by giving four points at each parking space. Then the results of the image are processed using lens correction to transform the lines and created a straight line and then the line's coordinates were stored. To classify a car, data from each parking space that is available or not was used as training for the SVM classifier with the Histogram of Oriented Gradients feature with eight orientations, Density of corners in the slot, and the color of the vehicle. To determine whether there is parking, a line model is used where the parked car will form a straight line then if there is an empty parking space, the line will be broken. The broken line will be marked as an empty parking space. The accuracy value obtained from this study was 97.6%. This research has an effective method for drones only so that if it is applied to the camera it will still have deficiencies that cannot form a line to detect the availability of parking space.
Other studies were taken from [5]. This research proposes a CNN method that uses a smaller architecture and parameter than AlexNet called a mini AlexNet to detect empty parking spaces. By using a mini AlexNet and dataset that has been made by the researcher, this research succeeded in detecting parking lots with an accuracy of above 90%. However, this study does not explain how to take the marking of each parking space so there are still some questions whether the mini AlexNet can find the parking spot automatically from the whole picture or whether the researchers conducted a marking process to find out the parking space location.

The proposed system for Detection of Car Parking Availability
This section will discuss the system for detecting the availability of parking space. To detect car parking spaces, researchers developed a system. We based this research from [2], [5], [9], and make improvements by detecting car space automatically and the smaller architecture of AlexNet. In this study, we have 2 main stages. The first stage is the marking stage where the system will be given an image of a parking lot full of parked cars to be marked. The next stage is the classification stage. In this stage, the parking space that had been marked from the marking process will be classified as available or not with input from image or video from CCTV. Therefore, this section will discuss the system that will be created. The whole system is illustrated in Figure 1. As described previously. The system of this research has two main stages namely marking and classification which will be described in the following sub-section.

Dataset Description
In this study, the first two data needed are the image of a parking lot that has been fully parked by a car for the marking stage and the second is a video recording from CCTV to detect the availability or occupied of parking spaces. We are using images and videos from CCTV located outdoor. This research will use training data obtained from [5], namely CNR Park data. This data set is a collection of images of outdoor parking lots that have been labeled.  In this data, there are various kinds of weather conditions ranging from sunny and rain with a size of 150 × 150 each according to Figure 2. For testing data will be used data from CNR Park which has a 1000 × 750 size image that captures the entire parking lot as in Figure 3.

Marking stage
The marking stage is a stage to find out all positions of the existing parking spaces so that this stage does not need to be repeated. The process that the researcher build is described in Figure 4. Parking lots that have been filled with parked cars will be photographed and then the photo will be 50 Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), volume 13, issue 2, June 2020 used as input for the marking stage. Photos that have been entered will be processed by using the pre-trained You Only Look Once (YOLO) V3 [8], [10]- [12] method to detect all the cars in the photo. We choose one image from the CCTV. The image must be a parking lot filled with a fully parked car. Using this image, we're going to detect every car in the image and generate a bounding box of each car. With the assumption that the position of cars is the same with the parking space, we're going to save the property of bounding box which contains position and position as well as width and height of the bounding box. The result of the marking process is in Figure 5.

Classification stage
The classification stage is a stage of determining the parking space that has been marked using markers from the marking stage. The classification will determine whether the parking is available or not. The process of the classification stage is illustrated in Figure 6. The video from the CCTV camera will be used as input. The input video will be processed by marking the parking location based on the bounding box obtained at the marking stage. After the bounding box is applied to CCTV video, each bounding box must detect whether there is a car in the parking space by processing each bounding box with Lite AlexNet. The use of Lite AlexNet in the classification stages is to process video quickly and accurately.
The architecture of the Lite AlexNet was built from the AlexNet model in Table 1 to Lite AlexNet in Table 2. The researcher used AlexNet because architecture AlexNet was able to make a good prediction surpass the state-of-the-art and AlexNet architecture has a small layer and parameter compared to the state-of-the-art now. Lite AlexNet will detect the car in each bounding box and determine whether the parking lot is available or not. The architecture of Lite AlexNet is based on the AlexNet architecture itself with the number of layers and parameters reduced. This reduction aims to speed up the computing time of the classification process. Another reason we proposed Lite AlexNet is the number of classes that want to predict is only 2 classes while AlexNet can predict 1000 classes which make AlexNet inefficient. The number of layers in AlexNet is 8 layers consisting of 5 convolution layers and 3 fully connected layers while in Lite AlexNet, the number of layers becomes 4, which is 3 convolution layers and 1 layer fully connected. On Lite AlexNet, the size of the kernel in the convolution layer used the same kernel size on AlexNet, which is 11 × 11, 5 × 5, and 3 × 3. On the fully connected layer, Lite AlexNet uses only one layer and for the output layer it will produce one value that represents two classes that is available or occupied. The researcher only modified the architecture and train it from scratch without using a pre-trained model. The architecture of Lite AlexNet is the result of architectural modification and the number of parameters reduced from AlexNet [4], [5], [13]- [20]. Lite AlexNet accepts input images with a size of 151 × 151 and has an RGB color so before entering the image in a bounding box, the image will be resized to adjust the size of the input layer. Each convolution layer has a different filter size and has its stride. The activation function used is the Rectified Linear Unit (ReLU). ReLU replaces the negative value to 0 with equation (1).

( ) = max (0, )
Each convolution layer ends with a maxpooling layer according to AlexNet where maxpooling will draw the highest value from the specified filter size. Each convolution layer is going to visualize object features in the image such as the angle, edge, and texture of the target [21]. Modifications made will reduce the number of parameters [21] to make it more efficient to detect the availability of parking spaces. The reduction in the number of parameters is intended so that the classification process can be done quickly with an accuracy value that is not much different from AlexNet. TABEL   On the first layer, the convolution layer has a filter size of 11 × 11 with 20 total filters and with 2 strides. After that, we use the maxpooling layer with filter sizes 3 × 3 and stride 2. On the second layer will receive the value from the maxpooling layer on the first layer and then process with the layer convolution with size 5 × 5 with the number of kernels 25 and stride 3 and then maxpooling with filter size 3 × 3 with stride 1. On the third layer, it has a convolution layer with size 3 × 3 with the number of filters 30 and strides 2 followed by maxpooling with size 3 × 3 and stride 1. After convolution is done and gets features from the input image, the next layer is a fully connected layer where the convolution layer with 2 × 2 × 30 size features will be flattened into a 1D vector. The size of the fully connected layer is 30 which will then be processed by the fully connected layer with the RELU activation function and dropout of 0.4. Lite AlexNet's output layer is generating 1 value using the Sigmoid activation function in equation (2) where x is the result of the multi-layer perceptron and the Binary Cross-Entropy loss function which produces a value of 0 or 1. If there are cars in the bounding box, then in the next process, the bounding box will turn red. Conversely, if there are no cars in the bounding box, the bounding box will be green like in Figure. 7. The result of this process is the CCTV video that has a bounding box that can determine whether the spaces are empty or not.

Experiment and analysis
In this section, we will discuss the experiment result and analysis of this proposed system. This system was build using Python programming language and we use Keras to implement the CNN. We are using Google Colaboratory to write our Python and using GPU that was provided.

Result of the marking stage
The marking stage uses the YOLO V3 method to detect the position of the car in the parking lot. YOLO V3 method can classify 1000 classes of objects in an image. Therefore, the results of the YOLO V3 method is processed to get car objects only and then get bounding boxes from each car position. The data used to carry out this marking process is a parking lot image that has been parked fully by a car with a size of 1000 × 750 pixels with an RGB color of 6 images that have different weather and time conditions. Detection results using YOLO V3 can be seen in Table 3. For the detection process using YOLO V3 obtained good results when the weather is cloudy. YOLO V3 can detect objects with a high degree of accuracy in cloudy conditions because the brightness is not bright and not dark, so the car is detected properly. when it is sunny, there is a decrease in inaccuracy. This is because when the weather is sunny, white cars become blurry due to high lighting levels. Then the researchers tried to use the image of the parking lot at night. At night, the accuracy of the YOLO V3 becomes 45.16% due to low lighting levels and only cars that have bright colors like red or white can be detected. Researchers tried to use the parking lot when it was raining, and the accuracy value was 57.14%. This accuracy value is higher compared to the state of the night because there is still a moderate level of lighting and there is a splash of rainwater to create interference in the form of salt and pepper in the image.

Result of classification stage
The classification stage uses Lite AlexNet to classify parking spaces that have been marked as available or not. This stage will produce value 1 for occupied and value 0 for available. The Lite AlexNet will be tested for under two climates in Indonesia, namely sunny and rainy, then the researcher divides the time by in the morning, afternoon and evening respectively to detect what kind of condition this model can classify cars accurately when we place it in a real parking lots in mall or office that have a different level of brightness. Each situation will use 10 frames from video CCTV. Besides, Lite AlexNet results will be compared with other architectures namely AlexNet, Mini AlexNet, and VGG16. For measuring the success of this method, we're using the False Positive Rate, False Negative Rate, and Accuracy of each method [20].   The results of the testing phase found that the proposed method, which is Lite AlexNet, has a high degree of accuracy in some cases, namely in the morning, sunny, and morning rain. This is because during these weather conditions the lighting level in the parking lot is not too high so it can be classified. When the weather is dark, the proposed method of Lite AlexNet is inferior to AlexNet and VGG 16 which has a higher number of layers and parameters so that it can draw more features from the marked parking location. Based on the results of this trial, it is found that the Lite AlexNet proposed method can detect the position of the parking lot accurately which can be seen  from the values of accuracy, FPR, and FNR, respectively 100%, 0, and 0. When it is raining, the AlexNet Lite model can detect empty spaces well so that it gets a very low FPR value. For weather conditions during the afternoon rain, the method used for the test has decreased accuracy because the input from CCTV experiences blur due to the rain in the image causing salt & pepper noise. The result of the tested method can be seen in Table 5 and Table 6.

Conclusion
In this research, we proposed a system to detect the availability of parking space using YOLO V3 for marking the parking space and proposed a new architecture called Lite AlexNet to classify the parking space is available or occupied. From the result of the marking stage, we can detect the position of car space using the position of the car by entering an image from the parking lot that has parked full by car. The best result we get is 96.67% from the image with the condition of cloudy which has intermediate lighting level. For the classification process, the proposed Lite AlexNet can classify the marked parking space. The best result from this proposed method is from condition sunny morning, sunny afternoon, and rainy morning. This proposed method has successfully classified with the intermediate level of brightness using smaller architecture and parameters of CNN. In future work, an improvement of detection at night and when it heavily rains condition is needed so it can be applied to many kinds of the parking lot.