1Anup Bhange, 2Ankita Mahalle, 3Divisha Samrit, 4 Nishigandha Wagh, 5Rucheeka Gothe
2Student, 3 Student, 4 Student, 5 Student
1K.D.K.C.E, NAGPUR, INDIA.
Placement of students is one of the most important objectives of an educational institution. Students’ academic achievement and their placements in companies selection is a difficult issue in the current manual system. Reputation and yearly admission of an institution are dependent upon the placement chances of a student. It also improves the placement percentage rate. The objective of the project is to analyze the previous year’s dataset and predict the capability of the current year students. According to their performance the study used Naïve Bayes, Decision Tree, and Random Forest Algorithm to build the prediction model for placement of students. The model is built by both training and test set which gives accuracy in prediction. Placement of scholars is one of the vital activities in academic establishments. Admission and name of establishments primarily depend on placements. Hence all institutions strive to Strengthen the placement department. The main objective of this project is to analyze the previous year’s student’s historical data and predict the placement possibilities of current students and aids to increase the placement percentage of the institutions. This project presents a recommendation system that predicts whether the current student will be placed or not, if the student is placed the company is also predicted based on the data of previously placed students. Here we use two different data science classification algorithms, namely Random forest and Decision tree algorithm. These algorithms independently predict the results and we then compare the efficiency of the algorithms, which is based on the dataset. This model helps the position cell at intervals a corporation to spot the potential students and concentrate on and improve their technical and social skills.
Index Terms- Naïve Bayes, Decision Tree, Random Forest, Dataset, Data Science.
Placements are appreciable to be very important for each and every college. The success of the college is measured by the campus placement of the students. Every student takes admission to the colleges by seeing the placements percentage of the college. Hence, in this regard the approach is about the prediction and analyses for the placement is important in the colleges that help to build the colleges as well as students to improve their placements chance. In Placement Prediction system predicts the probability of student getting placed in a company or not by applying classification algorithms such as Decision tree and Random forest. The main objective of this model is to predict whether the student gets placed or not in campus recruitment. For this the data consider is the academic history of students like overall percentage, skills, CRT training,etc. The algorithms are applied to the previous year’s students data.. Every student takes admission to the colleges by seeing the placements percentage of the college. Hence, in this regard the approach is about the prediction and analyses for the placement important for the colleges that help to build the colleges as well as students to improve their placements.
2.1 Data Gathering
In this phase gather the previous year data from training and placement Office. The combination of various attributes determines whether the student is placed or not. The quantitative aspects like undergraduate CGPA, Marks obtained in X and XII form the major aspect of a student’s academic endeavors. The qualitative aspects like communication and programming skills form a backbone for a student to get placed as each recruiting company desires to hire students that have the sound technical knowledge and ability to communicate effectively
2.2 Preprocessing and Cleaning
Data preprocessing is the first step towards building a data science model. It is a technique that is used to convert the raw dataset into a clean dataset. Whenever the data is gathered from different sources it is gathered in raw format which is not feasible for the analysis. Therefore certain steps like data cleaning, data integration, transformation, and reduction are executed to convert the data into a clean data set.
Processing is a method in which different algorithms applied on data to find the best results:
2.3.1 Random forest tree: A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models. The number of trees in the forest the more vigorous than the forest looks. In the same way higher the number of trees in the forest gives higher accuracy. Random forest algorithm follows the following pseudo-code:
- Takes the test features and uses the rules of each created decision tree to foretell the outcome and stores the outcome.
- Calculate the votes for each predicted target.
- Consider the high voted predicted value as the outcome from the random forest algorithm.
Each random forest will predict a different target (outcome) for the same test feature. Then by considering each predicted target votes will be calculated.
The Advantages of a random forest algorithm are as follows:
- Handles thousands of input variables without variable deletion
- Gives estimates of what variables are important in the classification
- Runs efficiently on large databases
2.3.2 Decision Tree: In Decision Trees, for predicting a class label for a record we start from the root of the tree. We compare the values of the root attribute with the record’s attribute. Based on the comparison, we follow the branch corresponding to that value and jump to the next node. The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data (training data).
- Put down the best attribute of the dataset at the root of the tree.
- Break the training set into subsets. Subsets should be made in such a way that each subset contains data with the same value for an attribute.
- Repeat step 1 and step 2 on each subset till you find leaf nodes in all the branches of the tree
2.3.3 Naïve Bayes: Naive Bayes model is quiet easy to build and specially useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes theorem supply a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
2.3.4K-Nearest Neighbor: K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of following steps –
Step 1 − For applying any algorithm, we require a dataset. So throughout the first step of KNN, we must load the training as well as test data.
Step 2 − Next, we need to select the value of K i.e. the nearest data points. K can be any integer.
Step 3 − For each point in the test data do the following –
3.1 − Compute the distance between test data and each row of training data with the help of any of the method namely: Euclidean, Manhattan or Hamming distance. The most used method to calculate distance is Euclidean.
3.2 − Now, based on the distance value, sort them in ascending order.
3.3 − − Next, it will select the top K rows from the sorted array.
3.4 − Now, it will allocate a class to the test point based on the most frequent class of these rows.
Step 4 − End
Campus placement is a process where companies come to colleges and identify students who are talented and qualified before they finish their graduation. The proposed system determines the like hood of placement based on various attributes of the student’s profile. First, gather placement data of the previous year. Do the cleaning process on the data set. After cleaning detecting missing value, check data quality assurance. Next select modeling technique and build model. After this evaluate the result. And get a result.
Python is a very powerful and flexible programming language . This is an open-source language has created quite a few tools to efficiently work with a python. There are some basic libraries which are essential for building a project:
Pandas: Pandas is a python package created for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Matplotlib: Matplotlib is a 2D plotting python library, with which we can plot various plots in python across various environments. It is an alternate option to seaborn, and seaborn is related with matplotlib.
Scikit-learn: It is an easy-to-use Python library that is used to build a machine learning model. It is built on NumPy, SciPy, and matplotlib. Below is the official documentation for the sci-kit learn library.
Numpy: NumPy is a python library also called as Numeric python which can execute scientific computing. You all must know that python never provides an array data structure, only with the help of a NumPy library you can create and perform manipulations on an array.
3.RESULTS AND CONCLUSION:
This system is beneficial for institutions to predict student’s campus placement and placement officers can work on identifying the weakness of each student. They can also suggest improvements so that the student can overcome the weakness and supply to the best of their abilities. Algorithms like random forest and KNN will give maximum accuracy to the prediction.
The future enhancements of the project are to focus on adding some more parameters to predict more well organized placement status. We can also enhance the project by predicting some solutions or suggestions for the output generated by the system.
The authors will like to thank K.D.K.C.E. for giving the student data for creating the dataset to do the research and development and also the reviewers for their constructive comments.
- Pothuganti Manvitha, Neelam Swaroopa, “Campus Placement Prediction Using Supervised Machine Learning Techniques” International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, issue 2019.
- Mangasuli Sheetal B, Prof. Savita Bakare “Prediction of Campus Placement Using Data Mining AlgorithmFuzzy logic and K nearest neighbor” International Journal of Advanced Research in Computer and Communication Engineering Vol. 5, Issue 6, June 2016.
- Ajay Shiv Sharma, Swaraj Prince, Shubham Kapoor, Keshav Kumar “PPS-Placement prediction system using logistic regression” IEEE international conference on MOOC, innovation, and Technology in Education(MITE), December 2014.
- Jai Ruby, Dr. K. David “Predicting the Performance of Students in Higher Education Using Data Mining Classification Algorithms – A Case Study” International Journal for Research in Applied Science & Engineering Technology (IJRASET) Vol. 2, Issue 11, November 2014