Hello everyone, in this blog we are going to discuss what is confusion matrix and what role it plays in the world of cybersecurity.
So let's start by answering this simple question
What is Confusion Matrix?
Confusion Matrix is a concept that is used to find the accuracy of the model that we create in Machine learning or we can explain it as a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known.
The basic terms that the Confusion matrix has are:
- True Positive [ TP ]: In TP, the Machine Learning model predicted right and it was actually right.
- True Negative [ TN ]: In TN, the Machine Learning model predicted right but actually it was the wrong prediction, also called False alarm.
- False Positive [ FP ]: In FP, the model predicts the wrong but actually it was right
- False Negative [FN ]: In FN, the model predicted wrong and actually it as wrong.
There are two types of error in the confusion matrix:
- False Negative and
- False Positive
The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.
This error causes problems in the cybersecurity world where the tools used are based on machine learning or ai, it may give a False Negative error that may cause dangerous impacts.
Therefore the role of the confusion matrix is important in the field of machine learning.
Let's understand this concept with the help of an example:
Suppose we create an ML model which predicts whether the given image is of chocolate or not. Let there be a total of 100 predictions done by model:
True Positive (TP) =30
True Negative (TN) =55
False Positive (FP) =10
False Negative (FN) =5
To calculate Accuracy::
Confusion Matrix’s implementation in monitoring Cyber Attacks:
The data set was used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called intrusions or attacks, and ``good’’ normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
In the KDD99 dataset these four attack classes (DoS, U2R, R2L, and probe) are divided into 22 different attack classes that tabulated below:
In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test
(CPT) computed using the confusion matrix and a given cost matrix.
• True Positive (TP): The amount of attack detected when it is actually attacked.
• True Negative (TN): The amount of normal detected when it is actually normal.
• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
• False Negative (FN): The amount of normal detected when it is actually attacked.
A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of
performance metrics like accuracy, precision, recall, and F1-score.
Need for Confusion Matrix in Machine learning:
1. It evaluates the performance of the classification models, when they make predictions on test data, and tells how good our classification model is.
2. It not only tells the error made by the classifiers but also the type of errors such as it is either type-I or type-II error.
3. With the help of the confusion matrix, we can calculate the different parameters for the model, such as accuracy, precision, etc.