Channel: K-analytik » Kaggle

↧

Data exploration using Random Forests

November 11, 2013, 8:31 am

≪ Previous: Predicting the Random Walk

The Stumbleupon web page classification competition on Kaggle ended recently. With some luck, I got into the final top 10%. During the initial data exploration, I tried to derive a set of linguistic features from the text. This includes something like the ratios of nouns, adjectives, and adverbs in the web pages. In addition, I suspected that subjectivity might be important. So I added the ratios of positive and negative words in the feature set as well.

To see these linguistic features are useful or not, I plugged the data into a Random Forests. Random Forests is a collection of decision trees. Each tree is trained on a bootstrapped sample of the original data and is grown using a random subset of the input variables. In spite of the fact that it is a black box model, it is probably one of the best off-the-shelf classifiers with virtually no parameter tuning and good accuracy. In addition, as each tree is trained using different variables, the variable importance measure comes as a byproduct [1].

Feature importances measured by Random Forests

The estimated accuracy of the model is only 62%, which shows that the linguistic variables are not that useful. However, the above picture still surprised me. It showed that positive and negative word usages are not useful and my hypothesis was completely wrong. On the other hand, the ratio of nouns ranked top on the feature list and it was something that I didn’t think of.

[1] Feature importances with forests of trees, scikit-learn 0.14 documentation

↧

Latest Images

7 clever tricks Primark does to keep you walking & buying more than you need...

7 clever tricks Primark does to keep you walking & buying more than you need...

July 20, 2025, 5:14 am

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

July 20, 2025, 5:06 am

Paintings of English Downs 2

Paintings of English Downs 2

July 20, 2025, 4:30 am

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

July 20, 2025, 3:30 am

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

July 20, 2025, 1:14 am

Who is Kevin Lerena’s wife Geraldine?

Who is Kevin Lerena’s wife Geraldine?

July 20, 2025, 12:57 am

Man stabs woman, baby to death inside Queens home, police say

Man stabs woman, baby to death inside Queens home, police say

July 19, 2025, 11:00 pm

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

July 19, 2025, 9:45 pm

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

July 19, 2025, 7:29 pm

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

July 19, 2025, 2:11 pm

Trending Articles

Practice Sheet of Pronoun References for HSC Students

October 19, 2019, 11:55 pm

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

December 17, 2013, 6:12 pm

Download: D boy ft Shenky & Chester – Nafola nafulwa”Prod By: Shenky”

November 24, 2016, 2:37 pm

Detroit Mafia’s Consigliere Tony Pal, Possible Final Tie To Hoffa Mystery,...

January 12, 2019, 10:04 am

Mahakal Attitude Status

February 29, 2020, 9:52 am

Tinny — Dzormo (Prod by Hammer)

March 19, 2015, 9:01 am

Chapter 3 Mindful Eating: A Path to a Healthy Body Extra Questions and...

November 29, 2024, 10:24 am

Farrah Stone Johnson Pitcher Jon Lester’s wife

October 10, 2016, 9:56 am

Students hit streets to save Agriculture College land in city

October 13, 2018, 2:20 am

Has Ezatrol gone Bust?

November 8, 2013, 4:08 am

Flux Full Pack 2.1 v3.5.16-R2R

May 6, 2016, 3:14 am

RAJKUMAR-GUALBANCE, VANDA

October 22, 2015, 11:01 pm

James Martin Normandy tart on James Martin’s French Adventure

February 21, 2017, 7:26 am

አዋጅ ቁጥር 881-2007 የሙስና ወንጀሎችን ለመደንገግ የወጣ አዋጅ

February 17, 2020, 7:49 am

Telangana Ration Card Online Status Ahara Bhadratha Card Online Status

June 25, 2016, 4:59 am

Interpreting the SupportedEncryptionTypes Registry Key

April 14, 2013, 6:47 pm

Waves Complete v2019.02.14 Incl Emulator-R2R

February 16, 2019, 7:50 am

236 kg banned scented tobacco worth Rs 1.26 lakh seized in Wadi

June 22, 2021, 5:54 am

CalCen

June 4, 2020, 6:35 pm

It’s Kind of a Funny Story 2010 Dual Audio 720p BRRip [Hindi – English] ESubs

June 8, 2016, 6:15 am

Latest Images

7 clever tricks Primark does to keep you walking & buying more than you need...

7 clever tricks Primark does to keep you walking & buying more than you need...

July 20, 2025, 5:14 am

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

Art for Everyone! Autism advocacy, local stories, and indigenous pride in one...

July 20, 2025, 5:06 am

Paintings of English Downs 2

Paintings of English Downs 2

July 20, 2025, 4:30 am

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

How Kerala Women Rescued a Dying Forest and Turned It Into a Safe Haven for...

July 20, 2025, 3:30 am

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

Met Eireann warns of heavy rain & spot flooding for DAYS before big...

July 20, 2025, 1:14 am

Who is Kevin Lerena’s wife Geraldine?

Who is Kevin Lerena’s wife Geraldine?

July 20, 2025, 12:57 am

Man stabs woman, baby to death inside Queens home, police say

Man stabs woman, baby to death inside Queens home, police say

July 19, 2025, 11:00 pm

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

Ang papel ni whistleblower Julie Patidongan sa kaso ng mga nawawalang sabungero

July 19, 2025, 9:45 pm

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

Telangana Human Rights Commission (TGHRC) seeks report from revenue dept on...

July 19, 2025, 7:29 pm

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

Crisis-hit NHS fat cats raking in MASSIVE salaries as frontline services cry...

July 19, 2025, 2:11 pm

© 2025 //www.rssing.com