Project Description
Compromised Software Containers (COSOCO) Dataset
Abstract
COSOCO (Compromised Software Containers) is a synthetic dataset of 3364 images representing benign and malware-compromised software containers. Each image in the dataset represents a dockerized software container that has been converted to an image using common byte-to-pixel tools widely used in malware analysis. Software container records are labelled (1) benign or (2) compromised: A benign software container will have installed commonly used harmless packages and tools, whereas a compromised software container, will have, among harmless benign tools and packages, its underlying file system affected by some activated malware instance. Each compromised instance is accompanied by a mask, i.e. a black and white image which marks the pixels that correspond to the files of the underlying system that have been altered by a malware. COSOCO aims to support the identification of compromised software containers via the task of image classification task and the identification of compromised files and file system regions inside a container via the image segmentation task.
The dataset is available through Huggingface
Acknowledgement
This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101093069 (P2CODE). Disclaimer: Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the European Commission can be held responsible for them.
Citation & References
A. Nousias, E. Katsaros, E. Syrmos, P. Radoglou-Grammatikis, T. Lagkas, V. Argyriou, I. Moscholios, E. Markakis, S. Goudos, “Malware Detection in Docker Containers: An Image is Worth a Thousand Logs,” ICC 2025 – IEEE International Conference on Communications, Montreal, QC, Canada, 2025, pp. 6401-6407, doi: 10.1109/ICC52391.2025.11161263.