Paper Title (use style: paper title)

Download 1,33 Mb.

bet	5/11
Sana	28.09.2023
Hajmi	1,33 Mb.
	#1688914

1 2 3 4 5 6 7 8 9 10 11

Bog'liq
article

Architecture of Yolov7.

YOLOv7

YOLOv7 is a real-time object detection model that was proposed by the original creators of YOLOv4 in 2022 [7]. It surpassed all known object detectors in terms of speed and accuracy at the time, achieving speeds of up to 160 frames per second (FPS) with high accuracy.
YOLOv7 achieves these results by using a number of following new techniques:
• An extended efficient layer aggregation network (E-ELAN) that allows the model to learn and converge more efficiently.
• Model scaling for concatenation-based models that reduces the hardware requirements of the model.
• A bag of freebies, which are a set of small changes that can improve the accuracy of the model without affecting the inference speed.
The bag of freebies includes:
• Planned re-parameterized convolution, which is a technique that can improve the accuracy of convolution operations.
• Coarse label assignment for the auxiliary head and fine label assignment for the lead head, which are methods for assigning labels to objects in an image.
• Batch normalization in conv-bn-activation, which is a technique for improving the stability of deep learning models.
• Implicit knowledge, which is a technique for transferring knowledge from one model to another and introduced in [8].
• Exponential moving average as the final inference model, which is a technique for averaging the predictions of multiple models to improve accuracy.
Overall, YOLOv7 is a significant improvement over previous YOLO models in terms of both speed and accuracy. It is a powerful tool for real-time object detection in a variety of applications.

Fig. 4. YOLOv6 Rep-PAN Neck.
Architecture of Yolov7. The YOLO network has three main parts which is shown in “Fig.5”:
• The backbone: This is a convolutional neural network that extracts features from the image. These features are called embeddings.
• The neck: This is a collection of neural network layers that combines and mixes the features from the backbone. This helps to improve the accuracy of the predictions.
• The head: This is the part of the network that generates the predictions. It takes the features from the neck and outputs bounding boxes and class probabilities for each object in the image.
The specific architecture of the YOLOv7 network is shown in the figure below. It consists of a backbone based on the EfficientNet-L2 architecture, a neck that uses a feature pyramid network, and a head that outputs 3 bounding boxes and 1 class probability for each grid cell in the image.
The diagram on Figure 2 was created by mmlab, not the authors of the YOLOv7 paper. Therefore, the naming of some network blocks might not be the same. For example, the ELANBlock in “Fig.6” [10] refers to the E-ELAN module mentioned in the paper.

Download 1,33 Mb.

Do'stlaringiz bilan baham:

1 2 3 4 5 6 7 8 9 10 11