Data Analysis


Data Description

The dataset comprises records of internet traffic observed by a basic intrusion detection network. These records represent remnants of the traffic that was detected by a real Intrusion Detection System (IDS), preserving only traces of its existence. Each record in the dataset contains 43 features, out of which 41 pertain to the characteristics of the traffic input, while the last two features represent labels indicating whether the traffic is classified as normal or an attack, as well as a score indicating the severity of the traffic input.



Within the data set exists 4 different classes of attacks: Denial of Service (DoS), Probe, User to Root(U2R), and Remote to Local (R2L). A brief description of each attack can be seen below:

DoS is an attack that tries to shut down traffic flow to and from the target system.

R2L is an attack that tries to gain local access to a remote machine. An attacker tries to “hack” their way into the network.

U2R is an attack that starts off with a normal user account and tries to gain access to the system or network, as a super-user (root).

Probe or surveillance is an attack that tries to get information from a network.



The features in a traffic record provide the information about the encounter with the traffic input by the IDS and can be broken down into four categories: Intrinsic, Content, Host-based, and Time-based.

The feature types in this data set can be broken down into 4 types:

4 Categorical
6 Binary
23 Discrete
10 Continuous