Welcome to visit《 Journal of Air Force Engineering University 》Official website!

Consultation hotline:029-84786242 RSS EMAIL-ALERT
A Diffusion Model Approach to Malicious Code Dataset Expansion
DOI:
CSTR:
Author:
Affiliation:

Clc Number:

TP309

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the support of big data in recent years, deep learning models have been demonstrating excellent capabilities in the aspects of computer vision and natural language processing. However, in the application of malicious code images fields, it is entirely possible for the malicious code to be insufficient training data. The distribution of whole dataset with number of training samples in some malicious families being limited is hardly characterized fully, and the deep learning model may be over-fitted to these scarce data, resulting in poor model performance. In view of the above-mentioned problems, this paper proposes a dataset expansion method based on the diffusion model to generate new samples. Such a method is to achieve dataset expansion by learning the conversion process from the original data to noise and using the inverse process to reduce the noise samples into new similar samples, generating new samples similar to the original dataset but different from the original dataset, alleviating the impact of the imbalance of data of some of the families on the classification and detection task, and improving the model’s generalization ability.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: February 16,2025
  • Published:
Article QR Code