Recent Questions and Assignment Topics

SIT717 Assignment 2 - Technical Report

Solution

Task Title: Data Analytic Technical Report

Subject Code: SIT717

Objective: Objective of this part of the assignment is to apply the data-analytics technique on real-world data and extract some useful information. Students are allowed to use any of the following strategies for this assignment: Supervised learning, unsupervised learning, time series prediction, text-mining etc.

 

Overview: This project is designed to provide students with a good opportunity to use data
mining and machine learning method in discovering knowledge from a dataset and
explore the applications for business intelligence. It is the second part of the individual
project work and you are required to implement the required analysis together with a
written report. This written assessment will be a technical report with no less than 3000
words.           

 

University: Deakin University  

Tool requirement:

  • Weka Machine Learning Tool: This tool implements almost every popular machine learning algorithms (Supervised and Unsupervised).
  • MS-Excel: Spreadsheet will help to make filtering on the data before data analysis.

Task Description:

In this portfolio, we have shown a Text mining based approach for this assignment.  For this task, we have used yelp’s labelled data present in the ‘Sentiment Labelled Sentences Data Set’ under UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). We have built a Random-forest based supervised classifier to identify the sentiment of any comment.

Steps of Development:

  • CSV to ARFF conversion: CSV is the comma separated representation of the data which is not preferable in Weka. We have converted the CSV to a native Weka format know as .ARFF.
  • String Tokenization: StringToWordVector is used to convert a string to a word vector.
  • Token Filtering: Stop word removal, IDFTransformation, TFTransformation and Lovins Stemmer are used for the filtering.
  • DatasetSplitting: Train and Test split up is done with 80:20 ratio.
  • Classifier Design: J-48 and Random forest classifier ware design for this task. Based on the performance, we prefer to use Random forest.
  • Verification of the Classifier: Using the test dataset we verify the accuracy of the classifier. 54.5% accuracy obtained in J-48 classifier and 72% in the Randomforest classifier.

 

 

Sample Output:

 

 

 

Result

J-48

Random forest

Summery

Correctly Classified Instances         109               54.5    %

Incorrectly Classified Instances        91               45.5    %

Kappa statistic                                  0.1209

Mean absolute error                       0.4404

Root mean squared error               0.5042

Relative absolute error                    87.8155 %

Root relative squared error            100.5014 %

Total Number of Instances              200

 

 

Correctly Classified Instances         144               72      %

Incorrectly Classified Instances        56               28      %

Kappa statistic                                 0.4344

Mean absolute error                      0.3582

Root mean squared error              0.4202

Relative absolute error                   71.4271 %

Root relative squared error           83.7519 %

Total Number of Instances            200  

Detailed Accuracy

 

Class 0

Class 1

Avg

TP Rate 

0.730

0.396

0.545

FP Rate 

0.604

0.270

0.418

Precision 

0.492

0.647

0.578

Recall  

0.730

0.396

0.545

F-Measure 

0.588

0.492

0.535

MCC     

0.133

0.133

0.133

ROC Area 

0.638

0.638

0.638

PRC Area 

0.548

0.705

0.635

 

 

 

Class 0

Class 1

Avg

TP Rate 

0.697

0.739

0.720

FP Rate 

0.261

0.303

0.285

Precision 

0.681

0.752

0.721

Recall  

0.697

0.739

0.720

F-Measure 

0.689

0.745

0.720

MCC     

0.434

0.434

0.434

ROC Area 

0.814

0.814

0.814

PRC Area 

0.754

0.862

0.813

 

Confusion Matrix

  a    b   <-- classified as

 65 24 |  a = 0

 67 44 |  b = 1

 

  a  b   <-- classified as

 62 27 |  a = 0

 29 82 |  b = 1

 

Expert’s Comments: Every machine learning-based project is a challenging one. The art of selection of proper data-set and algorithm is very important. Our efficient team having vast industrial experience helps our student to achieve the best for this type of assignment.  

Latest Reviews
Oliver, Bristol
35 minutes ago

 

Great work! Well done! Thank you for your guidance!

Joe Root, London
19 minutes ago

Their services are truly upstanding and are behind its product 100%. Great job, I will definitely be ordering again! I am really satisfied with my assignment works.

Jacob Atkinson, Southampton, U.K.
10 minutes ago

My experience with Quick Assignment was very satisfactory. The company provides various discount offers at different times and also delivers the assignment help before deadlines.

Damon Hill, Norwich
29 minutes ago

I was really quite confused regarding innovative project proposals However certainly one of my friends asked me about quick assignment. I have approached them and the solution that they have provided me is excellent as per my teacher’s view. They are providing services on-time and responding quickly to the queries.