Confusion Matrix in Cyber Security

Hello everyone, in this blog we are going to discuss what is confusion matrix and what role it plays in the world of cybersecurity.

So let's start by answering this simple question

What is Confusion Matrix?

The basic terms that the Confusion matrix has are:

  • True Positive [ TP ]: In TP, the Machine Learning model predicted right and it was actually right.
  • True Negative [ TN ]: In TN, the Machine Learning model predicted right but actually it was the wrong prediction, also called False alarm.
  • False Positive [ FP ]: In FP, the model predicts the wrong but actually it was right
  • False Negative [FN ]: In FN, the model predicted wrong and actually it as wrong.

There are two types of error in the confusion matrix:

  • False Negative and
  • False Positive

The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.

This error causes problems in the cybersecurity world where the tools used are based on machine learning or ai, it may give a False Negative error that may cause dangerous impacts.

Therefore the role of the confusion matrix is important in the field of machine learning.

Let's understand this concept with the help of an example:

Suppose we create an ML model which predicts whether the given image is of chocolate or not. Let there be a total of 100 predictions done by model:

True Positive (TP) =30
True Negative (TN) =55
False Positive (FP) =10
False Negative (FN) =5

To calculate Accuracy::

(TP+TN)/ (TP+TN+FN+FP)

=(30+55)/ (30+55+10+5)

= 0.85

Confusion Matrix’s implementation in monitoring Cyber Attacks:

In the KDD99 dataset these four attack classes (DoS, U2R, R2L, and probe) are divided into 22 different attack classes that tabulated below:

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test
(CPT) computed using the confusion matrix and a given cost matrix.
• True Positive (TP): The amount of attack detected when it is actually attacked.
• True Negative (TN): The amount of normal detected when it is actually normal.
• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
• False Negative (FN): The amount of normal detected when it is actually attacked.

Conclusion:

Need for Confusion Matrix in Machine learning:
1. It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
2. It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
3. With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.