Abstract:Wideband signal detection framework enables to realize detection, identification, and time-frequency localization of multiple signals in the wideband RF systems with object detection being combined with spectrograms based on deep learning, whereas directly applied original network architecture is difficult to achieve optimal signal detection performance on actual task datasets. For the above-mentioned reasons, this paper proposes a network architecture, SignalNet, for voice signal detection task, which is decoupled for task-oriented optimization according to the characteristics of the voice signals and task dataset. Specifically, the backbone network is streamlined, which is responsible for feature extraction, a neck network that comprises the multi-scale time-frequency feature context fusion and gating attention modules is introduced, and the traditional anchor-based detection head is replaced with an anchor-free one. The experimental results show that the proposed network architecture achieves the optimal detection performance for the voice signal detection task, mAP reaches not only 97.42%, but also is in maintaining fewer model pa rameters and faster inference speed.