Recent Questions and Assignment Topics

SIT717 Assignment 2 - Technical Report

Solution

Task Title: Data Analytic Technical Report

Subject Code: SIT717

Objective: Objective of this part of the assignment is to apply the data-analytics technique on real-world data and extract some useful information. Students are allowed to use any of the following strategies for this assignment: Supervised learning, unsupervised learning, time series prediction, text-mining etc.

 

Overview: This project is designed to provide students with a good opportunity to use data
mining and machine learning method in discovering knowledge from a dataset and
explore the applications for business intelligence. It is the second part of the individual
project work and you are required to implement the required analysis together with a
written report. This written assessment will be a technical report with no less than 3000
words.           

 

University: Deakin University  

Tool requirement:

  • Weka Machine Learning Tool: This tool implements almost every popular machine learning algorithms (Supervised and Unsupervised).
  • MS-Excel: Spreadsheet will help to make filtering on the data before data analysis.

Task Description:

In this portfolio, we have shown a Text mining based approach for this assignment.  For this task, we have used yelp’s labelled data present in the ‘Sentiment Labelled Sentences Data Set’ under UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences). We have built a Random-forest based supervised classifier to identify the sentiment of any comment.

Steps of Development:

  • CSV to ARFF conversion: CSV is the comma separated representation of the data which is not preferable in Weka. We have converted the CSV to a native Weka format know as .ARFF.
  • String Tokenization: StringToWordVector is used to convert a string to a word vector.
  • Token Filtering: Stop word removal, IDFTransformation, TFTransformation and Lovins Stemmer are used for the filtering.
  • DatasetSplitting: Train and Test split up is done with 80:20 ratio.
  • Classifier Design: J-48 and Random forest classifier ware design for this task. Based on the performance, we prefer to use Random forest.
  • Verification of the Classifier: Using the test dataset we verify the accuracy of the classifier. 54.5% accuracy obtained in J-48 classifier and 72% in the Randomforest classifier.

 

 

Sample Output:

 

 

 

Result

J-48

Random forest

Summery

Correctly Classified Instances         109               54.5    %

Incorrectly Classified Instances        91               45.5    %

Kappa statistic                                  0.1209

Mean absolute error                       0.4404

Root mean squared error               0.5042

Relative absolute error                    87.8155 %

Root relative squared error            100.5014 %

Total Number of Instances              200

 

 

Correctly Classified Instances         144               72      %

Incorrectly Classified Instances        56               28      %

Kappa statistic                                 0.4344

Mean absolute error                      0.3582

Root mean squared error              0.4202

Relative absolute error                   71.4271 %

Root relative squared error           83.7519 %

Total Number of Instances            200  

Detailed Accuracy

 

Class 0

Class 1

Avg

TP Rate 

0.730

0.396

0.545

FP Rate 

0.604

0.270

0.418

Precision 

0.492

0.647

0.578

Recall  

0.730

0.396

0.545

F-Measure 

0.588

0.492

0.535

MCC     

0.133

0.133

0.133

ROC Area 

0.638

0.638

0.638

PRC Area 

0.548

0.705

0.635

 

 

 

Class 0

Class 1

Avg

TP Rate 

0.697

0.739

0.720

FP Rate 

0.261

0.303

0.285

Precision 

0.681

0.752

0.721

Recall  

0.697

0.739

0.720

F-Measure 

0.689

0.745

0.720

MCC     

0.434

0.434

0.434

ROC Area 

0.814

0.814

0.814

PRC Area 

0.754

0.862

0.813

 

Confusion Matrix

  a    b   <-- classified as

 65 24 |  a = 0

 67 44 |  b = 1

 

  a  b   <-- classified as

 62 27 |  a = 0

 29 82 |  b = 1

 

Expert’s Comments: Every machine learning-based project is a challenging one. The art of selection of proper data-set and algorithm is very important. Our efficient team having vast industrial experience helps our student to achieve the best for this type of assignment.  

Latest Reviews
Steven Taylor, Carlisle
26 minutes ago

They are delivering plagiarism-free assignments. They are having great experience in handling the customers. They are answering all of my calls and consider all my opinion in the assignment. Long-running journey with the quick assignment.

George Cook, Lords
29 minutes ago

Thank you for providing me assignment on time really happy to have a helper like you. Thank you so much

Marcus Stonis, Chester
51 minutes ago

They are following all the instructions of the assignment by aiming to meet the expectations of the students. They are saving lots of student’s time and helping them by providing quality-based assignments. Highly preferred quick assignment

Willian George, Edinburgh
30 minutes ago

Due to part-time job, I do not get time to focus on my assignments, then my friend suggests me to take help from Quick Assignment. Hearty thanks to the experts who completed my assignment within the deadlines. Even my professor appreciates me for the outstanding work.