Cybersecurity Datasets:

#20. Vulnerable Smart Contracts (BCCC-VulSCs-2023)

Smart contracts are self-executing programs deployed on blockchain platforms such as Ethereum, designed to automate and enforce digital agreements without intermediaries. However, due to their immutable and public nature, vulnerabilities within their source code can lead to significant financial losses and system compromise. Vulnerable smart contracts pose critical risks to decentralized ecosystems, enabling exploits such as reentrancy, integer overflows, and unprotected self-destruction.

Introduction
The BCCC-VulSCs-2023 dataset provides a comprehensive repository of Solidity-based smart contracts annotated with vulnerability labels. It is designed to facilitate research in vulnerability detection, classification, and profiling using both traditional and learning-based techniques. The dataset includes 36,670 smart contracts, each characterized by 70 extracted features, to distinguish between secure and vulnerable code instances. These features were extracted using a custom analysis tool called SCsVolLyzer-V1.0, developed for granular inspection of smart contract attributes.
Four key statistical measures, average, minimum, maximum, and standard deviation, have been computed across all non-binary attributes, offering foundational insights into feature variability and dataset dynamics.

Figure 1: Average

Figure 2: Minimum

Figure 3: Maximum

Figure 4: Standard deviation

Dataset Details
The dataset is inherently imbalanced, reflecting the natural distribution of secure versus vulnerable smart contracts in real-world ecosystems. A breakdown of the label distribution is provided in Figure 5, where 73.39% of contracts are labeled secure and 26.60% are labeled vulnerable. This translates to:
• 26,914 secure contracts
• 9,756 vulnerable contracts
A detailed list of all 70 features is available in Table 1, alongside descriptions derived from SCsVolLyzer-V1.0's static and syntactic analysis of Solidity source code.

Figure 5: Labels distribution

Table 1: Extracted Features

These features capture semantic and structural elements such as opcode distribution, inheritance patterns, control-flow complexity, function modifiers, variable declarations, and function visibility settings—making the dataset suitable for both shallow and deep learning models, as well as traditional profiling methods like rule-based and evolutionary algorithms.


Feature Extraction Methodology
Using the SCsVolLyzer-V1.0 analyzer, each contract is statically parsed to generate a normalized feature vector. These features were chosen based on domain knowledge and prior vulnerability studies, aiming to capture indicators of insecure patterns such as excessive fallback logic, unchecked external calls, and misuse of tx.origin. The features span various categories:
• Control-flow metrics (e.g., number of conditional branches, loops)
• Code complexity measures (e.g., function count, cyclomatic complexity)
• Security-relevant indicators (e.g., use of deprecated constructs, modifier usage)
• Token-related metadata (if applicable)

Applications
The BCCC-VulSCs-2023 dataset is intended to support a wide range of research and industrial applications, including:
• Vulnerability classification using ML/DL
• Behavioral profiling of smart contracts
• Feature selection and explainability studies
• Benchmarking secure coding tools
• Genetic algorithm-based vulnerability search

License
You may redistribute, republish, and mirror the BCCC-VulSCs-2023 dataset in any form. However, any use or redistribution of data must include a citation to the BCCC-VulSCs-2023 dataset and the following paper:

- Sepideh Hajihosseinkhani, Arash Habibi Lashkari, Ali Mizani, “Unveiling Vulnerable Smart Contracts: Toward Profiling Vulnerable Smart Contracts using Genetic Algorithm and Generating Benchmark Dataset”, Blockchain: Research and Applications, Vol. 4, December 2023

You can download this dataset from here.
Researchers named among top researchers for Canada 150
The cybersecurity Research and Academic Leadership award, Canada 2019
The cybersecurity academic award, Canada 2017