Netspam Detection using R
issue 1

Netspam Detection using R

Kajal Fulariya ,Pooja Singh,Tanuja Ambagade, Prof. Annaji M. Kuthe, MRS. V. Surjuse,  

Department of Computer Science and Engineering,

S.R.M College of Engineering for Woman, Nagpur, India

ABSTRACT—

Netspam is a term used to define networks to map spam detection, by utilizing spam features for modeling review datasets as heterogeneous information . For this purpose, the noval framework called  Netspam is used . In many situations, netspam related technologies are becoming more popular among cyber crime investigation technologies that measure an individual’s geninuene data& spam promotions . Netspam has generally used to authenticate and identify individuals by analyzing their identifying spammers based on tweeting history or social attributes, detecting abnormal behavior, and classifying tweet-embedded URLs. The purpose of this work is to present a Windows based real time application system using R Language for performing twitter spam detection. Instead of using a traditional methods (like Linguistic-based Methods Behavior-based Method, Graph-based Methods) a R language will reckon the data based on the using the adjective as a key for detection of spam words & then the support vector machine algorithm use for the distinguishing  and executing  the data collection and then the results is outputted to the users.

INTRODUCTION:

It is widely used to unsolicited messages to patrons  even if they are not aware that a netspam system is autonomously checking user’s tweeting history or social attributes, detecting abnormal behavior, and classifying tweet-embedded URLs. Spam recognition software has many application in the modern world such identify spam, displaying of authenticate user, to enhance the quality of their products and services and so on. The current Spam detection systems and applications in the market have deficiencies that range from Less accuracy, No information filtering concept in online social network, more time complexity, Validate only on restricted training datasets, Detection of spammers has shown least performance in terms of accuracy, precision, recall. However, the demand for a robust Spam detection system applicable across various industrial uses, organizations and the public is increasing dramatically.

There are many previous works on Spam Detection Systems but a robust solution that will address the deficiencies of the current spam detection remains elusive. In this a program of R language were used to develop and implement a Window based application that is capable of solving these challenges and problems. Individually, R language has given satisfactory results. However, R language program shows greater promise & performance better under different conditions that affect the detection process. Hence, it is only rational to use R language program to give superior results. Processing of Tweets is a task which refers to library systems which are generally small or medium in size. It is used to manage the spam tweets using a computerized system where he/she can record various incoming messages, notifications, etc.  The validations for user sending unsolicited messages are mainly done by the Hash tags. We Propose the Integration of Spam detection technique for the Validation of the user performing post uploading or message sending

LITERATURE SURVEY :

DESCRIPTION:

Online Social Media websites play a main role in information propagation which is taken into account as a crucial source for producers in their advertising operations also as for patrons in selecting products and services. People mostly believe on the written reviews in their decision-making processes, and positive/negative reviews encouraging/discouraging them in their selection of products and services. These appraisals thus became a critical believe success of a business while positive reviews can bring benefits for a corporation, negative reviews can potentially impact reliability and cause economic losses. the particular incontrovertible fact that anyone with any identity can leave comments as reviews provides a tempting opportunity for spammers to write down fake reviews designed to mislead users‟ opinion. These deceptive reviews are then multiplied by the sharing function of social media and broadcast over the online. The appraisals written to vary users‟ vision of how good a product or a service are considered as spam, and are often written in exchange for money.

Different techniques are employed by  researchers to hunt out the spam profiles in various OSNs. We are focusing only on the work that has been done to spot spammers in Twitter because it’s not only a social communication media but actually is employed to share and spread information associated with trending topics in real time. Table 1 is screening the rapid of the projects reviewed regarding the detection of spammers in Twitter

MODULE:

STEP 1 – DATASETS

Datasets are scratched from Twitter using Twitter Streaming API. It allows third party to gain access Twitter’s global stream of tweets data. We have collected a streamof twitter spam .In this dataset some of the reviews or words are tag as spam or real. 

STEP 2 – FEATURE EXTRACTION

It is a process of extracting features that can be employ by a machine learning technique to produce rigorous results, commonly used features can be organized into two categories: content-based and user-based. User-based features are used for detecting the spammers, and content-based features are used for detecting the spam tweets.

STEP 3- FEATURE SELECTION

Data almost always contains more information than is needed to create a model and in this system more than 100 features will be extracted, and some of them in the training dataset are very sparse, so this will affect the performance of a model if we add them to it. Feature selection is a step of selecting features that are most similar to a model or removing the unrelated features to improve the accuracy of a system

STEP 4- EVALUATION METRICS

In a spam detection system, Precision, Recall, F-measure and Accuracy are the common evaluation metrics to evaluate the performance of a system

Architecture:

The system will incorporate with twitter and able to read the tweets for particular hash tags.In order to restrict spammers, tweets are extract in a streaming way, and Twitter provides the Streaming API for developers and to access public tweets in real time

IMPLIMENTATION TOOLS :

1. Hardware Requirement

  • Hard Disk : 80 GB
  • RAM : 512 MB
  • Processor : Intel Pentium 4 and                   above

2. Software Requirement

  • Technology Used : R language
  • Tools : Twitter API,  R studio, R console
  • Operating System : Windows 7 or above

ADVANTAGES:

  • To identify spam also as different sort of analysis on this subject.
  • Written reviews also help service providers to reinforce the standard of their products and services.
  • To identify the spam user using positive and negative reviews in online social media.

To show only trusted evaluations to the users.

FUTURE SCOPE:

Twitter has millions of lively users and this number is constantly increasing. And almost all the writers have used very small difficult dataset to see the presentation of their approach. So, there is a need to increase the testing dataset to see the presentation of any approach. there is need to recover classifiers for enhancing the detection rate.

CONCLUSIONS:

  • Most of the work has been done using classification approaches like SVM, Decision Tree, Naive Bayesian, and Random Forest. Detection has been done on the crietria of user based features or content based features or a combination of both. Few developers also introduced new features for detection. All the approaches have been validated on very small dataset and have not been even tested with different combinations of spammers and nonspammers.

Combination of features for detection of spammers has shown better performance in terms of accuracy, precision, recall etc. as compared to using only user based or content based features.

REFERENCES:

  1. Ch. Xu and J. Zhang. Combating product review spam campaigns via multiple heterogeneous pairwise features. In SIAM InternationalConference on Data Mining, 2014.
  2. G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh. Exploiting burstiness in reviews for review spammer detection. InICWSM, 2013.
  3. A. j. Minnich, N. Chavoshi, A. Mueen, S. Luan, and M. Faloutsos. Trueview: Harnessing the power of multiple review sites. In ACM WWW,2015.
  4. B. Viswanath, M. Ahmad Bashir, M. Crovella, S. Guah, K. P. Gummadi, B. Krishnamurthy, and A. Mislove. Towards detecting anomaloususer behavior in online social networks. In USENIX, 2014.
  5. H. Li, Z. Chen, B. Liu, X. Wei, and J. Shao. Spotting fake reviews via collective PU learning. In ICDM, 2014.
  6. J. Donfro, A whopping 20 % of yelp reviews are fake.http://www.businessinsider.com/20-percent-of-yelpreviews-fake-2013-9. Accessed: 2015-07-30.
  7. M. Ott, C. Cardie, and J. T. Hancock. Estimating the prevalence of deception in online review communities. In ACM WWW, 2012.
  8.  M. Ott, Y. Choi, C. Cardie, and J. T. Hancock. Finding deceptive opinion spam by any stretch of the imagination.In ACL, 2011.
  9. Ch. Xu and J. Zhang. Combating product review spam campaigns via multiple heterogeneous pairwise features. In SIAM International Confer-ence on Data Mining, 2014.
  10. N. Jindal and B. Liu. Opinion spam and analysis. In WSDM, 2008.online reviews bynetwork effects. In ICWSM, 2013.
  11. 2010
  12. A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanosand R. Ghosh. Spotting opinion spammers using behavioral footprints. InACM KDD, 2013.

Related posts

Hexapod Spider Robot

admin

Analysis of Normal and Epileptic EEG Signal with Filtering Method

admin

Smart Entrance QR Code System

admin

Leave a Comment