Confusion Matrix and Cyber Security

 Confusion Matrix:  In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one . Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature.



Confusion Matrix Case Study

Let’s pretend we have a two-class classification problem of predicting whether a photograph contains a boy or a girl.

We have a test dataset of 10 records with expected outcomes and a set of predictions from our classification algorithm.

Expected

Predicted

Boy

Girl

Boy

Boy

Girl

Girl

Boy

Boy

Girl

Boy

Girl

Girl

Girl

Girl

Boy

Boy

Boy

Girl

Girl

Girl

The algorithm made 7 of 10 predictions correct with an accuracy of 70%

accuracy= total correct predictions/total predictions*100

accuracy=7/10*100

In this classification errors were made

First, we must calculate no of correct predictions for each class

Boys classified as boys: 3

Girls classified as girls: 4

Now, we can calculate the number of incorrect predictions for each class, organized by the predicted value.

Boys classified as girls: 2

Girls classified as boys:  1

We can now arrange these values into the 2-class confusion matrix:

 

Boys

Girls

Boys

3

1

Girls

2

4


  • The total actual boys in the dataset is the sum of the values on the boys column (3 + 2)
  • The total actual girls in the dataset is the sum of values in the girls column       (1 +4).
  • The correct values are organized in a diagonal line from top left to bottom-right of the matrix (3 + 4).
  • More errors were made by predicting boys as girls than predicting girls as boys.

Now we can summarize confusion matrix as follows:

                                     Confusion matrix

  • TP: True Positive: Predicted values correctly predicted as actual positive
  • FP: Predicted values incorrectly predicted an actual positive. i.e., Negative values predicted as positive
  • FN: False Negative: Positive values predicted as negative
  • TN: True Negative: Predicted values correctly predicted as an actual negative

What is the accuracy of the machine learning model for this classification task?


Accuracy

Accuracy represents the number of correctly classified data instances over the total number of data instances.

In this example, Accuracy = (3 + 4)/(3 + 4 + 1 + 2 ) = 0.7 and in percentage the accuracy will be 70%.


Precision
      Precision = (TP) / (TP+FP)
TP is the number of true positives, and FP is the number of false positives. 
A trivial way to have perfect precision is to make one single positive prediction and ensure it is correct . This would not be very useful since the classifier would ignore all but one positive instance. 

Recall
Recall = (TP) / (TP+FN)

 Cyber Security

Cyber attack is becoming a critical issue of organizational information systems. A number of cyber attack detection and classification methods have been introduced with different levels of success that is used as a countermeasure to preserve data integrity and system availability from attacks. The classification of attacks against computer network is becoming a harder problem to solve in the field of network security


                       Cyber security is the practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks. It's also known as information technology security or electronic information security. The term applies in a variety of contexts, from business to mobile computing, and can be divided into a few common categories.

         Network security is the practice of securing a computer network from intruders, whether targeted attackers or opportunistic malware.

·         Application security focuses on keeping software and devices free of threats. A compromised application could provide access to the data its designed to protect. Successful security begins in the design stage, well before a program or device is deployed.

·         Information security protects the integrity and privacy of data, both in storage and in transit.

·         Operational security includes the processes and decisions for handling and protecting data assets. The permissions users have when accessing a network and the procedures that determine how and where data may be stored or shared all fall under this umbrella.

·         Disaster recovery and business continuity define how an organization responds to a cyber-security incident or any other event that causes the loss of operations or data. Disaster recovery policies dictate how the organization restores its operations and information to return to the same operating capacity as before the event. Business continuity is the plan the organization falls back on while trying to operate without certain resources.

Thank you for visiting my blog😊

Comments

Post a Comment