Recent Questions and Assignment Topics

SIT717 Assignment 2 - Technical Report

Solution

Task Title: Data Analytic Technical Report

Subject Code: SIT717

Objective: Objective of this part of the assignment is to apply the data-analytics technique on real-world data and extract some useful information. Students are allowed to use any of the following strategies for this assignment: Supervised learning, unsupervised learning, time series prediction, text-mining etc.

 

Overview: This project is designed to provide students with a good opportunity to use data
mining and machine learning method in discovering knowledge from a dataset and
explore the applications for business intelligence. It is the second part of the individual
project work and you are required to implement the required analysis together with a
written report. This written assessment will be a technical report with no less than 3000
words.           

 

University: Deakin University  

Tool requirement:

  • Weka Machine Learning Tool: This tool implements almost every popular machine learning algorithms (Supervised and Unsupervised).
  • MS-Excel: Spreadsheet will help to make filtering on the data before data analysis.

Task Description:

In this portfolio, we have shown a Text mining based approach for this assignment.  For this task, we have used yelp’s labelled data present in the ‘Sentiment Labelled Sentences Data Set’ under UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). We have built a Random-forest based supervised classifier to identify the sentiment of any comment.

Steps of Development:

  • CSV to ARFF conversion: CSV is the comma separated representation of the data which is not preferable in Weka. We have converted the CSV to a native Weka format know as .ARFF.
  • String Tokenization: StringToWordVector is used to convert a string to a word vector.
  • Token Filtering: Stop word removal, IDFTransformation, TFTransformation and Lovins Stemmer are used for the filtering.
  • DatasetSplitting: Train and Test split up is done with 80:20 ratio.
  • Classifier Design: J-48 and Random forest classifier ware design for this task. Based on the performance, we prefer to use Random forest.
  • Verification of the Classifier: Using the test dataset we verify the accuracy of the classifier. 54.5% accuracy obtained in J-48 classifier and 72% in the Randomforest classifier.

 

 

Sample Output:

 

 

 

Result

J-48

Random forest

Summery

Correctly Classified Instances         109               54.5    %

Incorrectly Classified Instances        91               45.5    %

Kappa statistic                                  0.1209

Mean absolute error                       0.4404

Root mean squared error               0.5042

Relative absolute error                    87.8155 %

Root relative squared error            100.5014 %

Total Number of Instances              200

 

 

Correctly Classified Instances         144               72      %

Incorrectly Classified Instances        56               28      %

Kappa statistic                                 0.4344

Mean absolute error                      0.3582

Root mean squared error              0.4202

Relative absolute error                   71.4271 %

Root relative squared error           83.7519 %

Total Number of Instances            200  

Detailed Accuracy

 

Class 0

Class 1

Avg

TP Rate 

0.730

0.396

0.545

FP Rate 

0.604

0.270

0.418

Precision 

0.492

0.647

0.578

Recall  

0.730

0.396

0.545

F-Measure 

0.588

0.492

0.535

MCC     

0.133

0.133

0.133

ROC Area 

0.638

0.638

0.638

PRC Area 

0.548

0.705

0.635

 

 

 

Class 0

Class 1

Avg

TP Rate 

0.697

0.739

0.720

FP Rate 

0.261

0.303

0.285

Precision 

0.681

0.752

0.721

Recall  

0.697

0.739

0.720

F-Measure 

0.689

0.745

0.720

MCC     

0.434

0.434

0.434

ROC Area 

0.814

0.814

0.814

PRC Area 

0.754

0.862

0.813

 

Confusion Matrix

  a    b   <-- classified as

 65 24 |  a = 0

 67 44 |  b = 1

 

  a  b   <-- classified as

 62 27 |  a = 0

 29 82 |  b = 1

 

Expert’s Comments: Every machine learning-based project is a challenging one. The art of selection of proper data-set and algorithm is very important. Our efficient team having vast industrial experience helps our student to achieve the best for this type of assignment.  

Latest Reviews
Mike, London
48 minutes ago

 

Couldn’t ask for anything more! It was just perfect. I received A. My instructor loved the work.

Jacob, Croydon
17 minutes ago

BUSN20017 I want to thank you to help in getting good marks in BUSN20017 assignment. I would like to get your assistance in future too. Jacob, Croydon.
 

Paul Walker, Manchester
51 minutes ago

I am highly satisfied with the quality of services provided by Quick Assignment. The chat team is responsible enough to deliver assignments within the desired time. Thank you for guidance and assistance.

Jessica, Sydney
30 minutes ago

They have done exactly what I said. Assignment services delivered by them has impressed me on multiple levels.