Project Description

APT Sandworm Dataset

Abstract

Due to the rise in Advanced Persistent Threat (APT) attacks, modern digital systems and particularly critical infrastructures, must be equipped to counteract these stealthy, multistage intrusions. Unlike simple, isolated attack techniques, APTs are orchestrated by highly skilled adversarial groups that follow strategic, logic-driven plans. These groups employ specific tactics, techniques, and procedures (TTPs) to silently infiltrate systems, perform lateral movements across assets, and ultimately execute impactful actions once their objectives are reached.

To effectively study and defend against such threats, there is a pressing need for datasets that accurately reflect the complexity and sequential nature of real-world APT campaigns. This introduces the APT Sandworm dataset. Sandworm has been active since at least 2015, notably using the BlackEnergy malware to target Ukraine’s energy infrastructure. The group gained further notoriety with the NotPetya attack in 2017, which caused widespread disruption and significant financial losses, particularly in the logistics, shipping, and manufacturing sectors. In 2022, Sandworm was again linked to a cyberattack on Ukraine’s electric power infrastructure. Their consistent focus on energy-related targets underscores a persistent and strategic intent to disrupt critical infrastructure systems.

Instructions:
The dataset is organised as follows:

    • README file (this document, APT_Dataset_Readme.pdf): Provides a detailed description of the testbed infrastructure, which emulates a realistic critical infrastructure environment under attack. It also outlines the APT attack scenario and the key features of the dataset.
    • PCAP file (SandwormAPT.pcap): Contains the raw network traffic captured during the execution of the APT emulation. This data reflects the observable network-level behaviour of the attack.
    • Network flow dataset (SandwormAPT_flow_labelled.csv): Includes labelled network flow records corresponding to the attack procedures that are visible in the captured traffic. Only the steps of the APT campaign that generate detectable network activity are labelled in this file. Other attack stages, such as those occurring at the endpoint or system level, are not represented in the PCAP or flow data due to the lack of observable network evidence.

The dataset is available through Zenodo.

Citation & References

E. Iturbe, C. Dalamagkas, P. Radoglou-Grammatikis, E. Rios, and N. Toledo, “A pattern-aware LSTM-based approach for APT detection leveraging a realistic dataset for Critical Infrastructure Security,” Future Generation Computer Systems, vol. 178, p. 108308, May 2026, doi: 10.1016/j.future.2025.108308.