Keywords: traffic sign; intelligent vehicle


Figure 4. Spatial pyramid pooling. Inspired by the idea of SPP and YOLOv3-spp [37


Download 447.95 Kb.
bet5/6
Sana23.04.2023
Hajmi447.95 Kb.
#1383869
1   2   3   4   5   6
Bog'liq
Abstract

Figure 4. Spatial pyramid pooling.
Inspired by the idea of SPP and YOLOv3-spp [37], this study improved a traditional SPP module, and the improved structure is shown in Figure 5. The structure consisted of five branches. The first branch connected the input directly to the output, the second branch downsampled the input through a maximum pooling of size 3 × 3 and then output, the third branch downsampled the input through a maximum pooling of size 5 × 5 and then output, the fourth branch downsampled the input through a maximum pooling of size 7 × 7 and then output, and the fifth branch downsampled the input through a maximum pooling of size 9 × 9 and then output. Since the step size of the pooling layer was 1 and the padding operation was performed before the pooling operation, the length, width, and depth of the feature map output from these five branches were the same. Finally, these five feature maps were concatenated. This dense SPP network was added after the backbone network since YOLOv4-tiny disregards the fusion of multiscale local region features on the same convolutional layer. This dense SPP network converted the 13 × 13 × 512 feature maps generated by the 15th convolutional layer into 13 × 13 × 2560 feature maps. This structure achieved the fusion between feature maps of local and global features, and the multiscale fusion enhanced the characterization ability of the feature maps so that more features were passed to the next layer of the network. The number of input feature maps was then reduced from 2560 to 256 using 1 × 1 convolution to extract useful features from the large number of relevant features, which were later pooled to different scales to improve the detection accuracy of traffic signs.

Figure 5. Improved dense spatial pyramid pooling.
We enhanced the Yolo head module to increase the YOLOv4-tiny network’s ability to identify traffic signs. Following the enhanced feature extraction network, a YOLO detection layer was added. To create this detection layer, we fine-tuned a second YOLO detection layer and added convolutional layers with channel sizes of 128, 256, 512, and 24. The final output of this detection layer was a high-dimensional feature map of 52 × 52 × 24, which enhanced the accuracy of target localization and prediction. YOLOv4-tiny can only generate feature maps with the dimensions of 13 × 13 × 24 and 26 × 26 × 24. With these improvements, the TSR-YOLO algorithm achieved first YOLO layer outputting feature maps of 13 × 13 × 24, second YOLO detection layer outputting feature maps of 26 × 26 × 24, and third YOLO detection layer outputting feature maps of 52 × 52 × 24. This method could better detect long-distance traffic signs in complex scenarios and solve the problem of inaccurate localization and prediction of YOLOv4-tiny when locating small targets at a far distance. The network’s three YOLO detection layers were used to process and forecast the bounding boxes, objectness score, class predictions, and anchor boxes, where anchor boxes were used to identify the bounding boxes for each object in each class in the traffic sign recognition dataset. Because three detection layers were used and there were three classes of traffic signs in the dataset, the number of channels for each detection layer was calculated by the formula (class + 4 + 1) × 3 before designing each YOLO detection layer, and the channel size was set to 24. After completing the above improvements, the TSR-YOLO algorithm structure and the algorithm’s detailed network configuration are given in Figure 6.


Download 447.95 Kb.

Do'stlaringiz bilan baham:
1   2   3   4   5   6




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling