Intrusion Detection using Machine Learning
issue 1

Intrusion Detection using Machine Learning

Minal Bijwar

BE CSE 4th Year

SRMCEW, Nagpur, India

Gayatri Kuralkar

BE CSE 4th Year

SRMCEW, Nagpur, India

Harshika Kathane

BE CSE 4th Year

SRMCEW, Nagpur, India

 Mayuri Padmagirwar                        

BE CSE 4th year                                                          

SRMCEW, Nagpur, India                                                                 

 Vaishnavi Chaudhari

BE CSE 4th year

SRMCEW, Nagpur, India


Although, there are advanced Machine Learning (ML) methods adopted for Intrusion detection, the attacks on the data remains a biggest drawback of the Internet. The focus of this project is to identify or detect the attacks which occurs in the network. Nowadays, there is rapid growth in the number of people using social media platforms like facebook, instagram, etc. This platforms generate a large amount of data day by day. However, to detect the attacks on the data is a very challenging issue. This project will be detecting the attacks by analyzing the information present in the KDDCUP Dataset. The classification algorithm used for the classification of data present in the network is Naïve-Bayes classification algorithm. This project generates a final report which contains the attacks and the normal events.


A thorough study and analysis of different machine learning techniques have been carried out for discovering the cause of problems associated with various kind of machine learning techniques occurring in the detection of intrusive activities. Mapping of the attack features and classification of various attacks is provided with respect to each attack. Some issues related to the detection of low-frequency attacks using a dataset called network attack dataset is also discussed and feasible techniques are proposed for betterment. Machine learning techniques have been studied and compared in terms of their detection capability for detecting the different category of attacks. The process of dynamically keeping track of incidents happening in a computer system or any network, analyzing them for indications of possible events and often gripping the unauthorized access to the network. This can be achieved by automatically gathering information from diverse systems and network sources, and then analyzing the data for all possible security troubles.


The main aim is the effective classification and prediction of the data and to improve the overall performance of the prediction results. It has two systems namely Intrusion Detection System and Intrusion Prevention System. Intrusion Detection System (IDS) is a detective device developed to identify malicious (including policy-violating) events. An Intrusion Prevention System (IPS) is basically a preventive device implemented to detect as well as block malicious actions. The objective is to interrupt and to have control of an application or a machine or a system, thus permitting the attacker to gain access to rights and permissions available through the target and to disable the target causing in a denial-of-service situation. Most of the implementations have false positives and hence monitoring engineers unknowingly spend time in the investigation of non-malicious events and false negatives, which might lead to intrusions. The most important thing is the proper configuration of the system as it must reflect the organization’s traffic patterns. Intrusion Detection Systems (IDS) provides methods to prepare and deal with various cyber-attacks. Analyzing security problems through information collected from a variety of systems and network sources, monitoring system activity, audit system configurations and vulnerabilities, assessing the integrity of any critical system and data files, performing statistical analysis of event patterns, detecting abnormal activity and audit operating systems helps in in identifying the attacks.


  1. Title: Addressing the Class Imbalance Problem in Medical Datasets: It was published in the year 2012 by the author M. Mostafizur Rahman and D. N. Davis. A balanced dataset is crucial for creating a training set. The aim was to optimize the overall accuracy without considering the relative distribution of each class. One of the main cause for the decrease of generalization in machine learning algorithms is unbalanced real world data. To reduce  the  ratio  gap  between  the  majority  classes  with  the  minority class is the main objective of this project. This project plan helps to overcome the class imbalance problem of clinical datasets and other data domains and also useful for datasets having uncertain class labels. The disadvantage of this project is that the outcome labels of most of the clinical datasets are not consistent with the underlying data. Such datasets are not suitable for conventional over-sampling and under-sampling technique.
  2. Title: A survey on cloud computing security: It was published in the year 2010 by the Author R. Kanday. A general overview on Cloud Computing can be studied in this project. This project is brief discussion on Topics including characteristics, deployment and service models as well as drawbacks of cloud computing. The major part of countermeasures focuses on Intrusion Detection Systems. With topics such as Mobile Cloud Computing and Internet of Things, this survey paper gives a general explanation on the applications and potential coming with the integration of Cloud Computing with any mobile equipment that has Internet connectivity and also the challenges that are before it. Major issues and obstacles that Cloud Computing faces such as Several security issues and countermeasures are also discussed.
  3. Title: Data Mining: Practical Machine Learning Tool and Technique with Java Implementation: It was published in the year 2000 by the author Ian H. Witten and Eibe Frank. The convergence of computing and communication has produced a society that feeds on information. Information is mainly in the raw form. Data can be characterized as recorded facts, and information is the set of patterns, or expectations, that underlie the data. A databases contains a large amount of locked up information —information that is potentially crucial but has not yet been discovered. Our mission is to bring it forth. The weather data represents a set of days together with a decision for each as to whether to play the game or not. In these cases the output took the form of decision trees and classification rules, which are basic knowledge representation styles that many machine learning methods used. The main disadvantage is that the weather problem is a tiny dataset that one can use repeatedly to illustrate machine learning methods.




The process of selecting the data for detecting the attacks is data selection. In this project, the KDDCUP dataset is used for detecting the attacks. The dataset contains the information such as the duration, flag, service, source bytes, destination bytes and class labels.


Data pre-processing is the process of removing the unwanted data from the dataset. It has two methods namely missing data removal and encoding Categorical data. In Missing data removal, the null values such as missing values are removed using imputer library while in Encoding Categorical data, that categorical data is defined as variables with a finite set of label values. Most machine learning algorithms require numerical input and output variables and an integer and one hot encoding is used to convert categorical data to integer data.


Data splitting is the way of partitioning available data into two portions, usually for cross-validator purposes. One Portion of the data is used to develop a predictive model and the other to evaluate the model’s performance. An important part of evaluating data mining models is the separation of data into training and testing sets. When a dataset is separated into a training set and testing set, larger amount of the data is used for training, and a lesser portion of the data is used for testing.


Feature extraction or scaling is a method used to standardize the range of independent variables or features of data. Data processing is also known as data normalization and it is generally performed during the data pre-processing step. Data Pre Processing is applied to independent variables or features of data and this step is called Feature Scaling or Standardization. It basically normalise the data within a particular range. Sometimes, it is also useful  in speeding up the calculations in an algorithm.


Naïve Bayes are based on the concept of decision planes that define decision boundaries. A plane that separates between a set of objects having different class memberships is a decision plane. Naïve Bayes   classification is used and it is primarily a classier method which performs classification tasks by constructing hyper planes in a multidimensional space that separates cases of various class labels. Naïve Bayes supports both regression and classification tasks and can handle multiple continuous and categorical variables. The case values can be 0 or 1 for categorical variables.


It’s a process of predicting the attacks in the network from the dataset. This project will effectively predict the data from dataset by enhancing the performance of the overall prediction results.


The Final Result will get generated based on the overall classification and prediction. The performance of this proposed approach is evaluated using some measures like true positive, true negative, false positive, false negative, accuracy and  precision.



We revised various influential algorithms for intrusion detection based on various machine learning techniques. Characteristics of ML techniques makes it possible to design Intrusion Detection system that have high detection rates and low false positive rates while the system quickly adapts itself to changing malicious behaviors. These algorithms are divided into two types of Machine learning based schemes: namely Artificial Intelligence (AI) and Computational Intelligence (CI).  There are several features and many similarities in these two algorithms such as adaptation, fault tolerance, high computational speed that helps in building efficient intrusion detection systems.


  1. P. Yi, Y. Jiang, Y. Zhong, and S. Zhang, “Distributed Intrusion Detection for Mobile Ad Hoc Networks,” 2005 Symp. Appl. Internet Work, SAINT 2005 Work., pp. 94–97, 2005.
  2. H. Sedjelmaci and M. Feham, “Novel Hybrid Intrusion Detection System for Clustered Wireless Sensor Network,” Int. J. Netw. Secur. Its Appl. (IJNSA), Vol.3, No.4, July 2011, vol. 3, no. 4, pp. 1–14, 2011.
  3. L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection system using support vector machines and hierarchical clustering,” VLDB J., vol. 16, no. 4, pp. 507–521, 2007.
  4. S. K. Sahu, S. Sarangi, and S. K. Jena, “A detail analysis on intrusion detection datasets,” Souvenir 2014 IEEE Int. Adv. Comput. Conf. IACC 2014, pp. 1348–1353, 2014. 5. O. Can, C. Turguner, and O. K. Sahingoz, “A Neural Network Based Intrusion Detection System For Wireless Sensor Networks,” Signal Process. Commun. Appl. Conf. (SIU), 2015 23th, pp. 2302–2305, 2015.

Related posts

MetaBOT Android Application


“ Implementation Of Basic SMPS and LED Driver Using Arduino”


Face Recognition Using Eigen Face


Leave a Comment