Everything should be made as simple as possible, but not simpler. (Albert Einstein)

Friday, February 27, 2015

Some Introductory Machine Learning Books


Many Machine Learning books I encountered are too heavily math-wise (for a programmer). But I noted several introductory books,
  •  Machine Learning, Tom M. Mitchell, McGraw Hill. 
  •  Introduction to Machine Learning 2nd edition, Ethem Alpaydin, MIT Press. (without example code)
  •  Bayesian Reasoning and Machine Learning, David Barber (this has free online draft version, last draft is dated Dec 13, 2014) (ex. code in Matlab with BRMLToolbox).
  •  Machine Learning, A Probabilistic Perspective, Kevin P Murphy, MIT Press. (ex. code in Matlab with PMTK package.)
  •  Machine Learning, An Algorithmic Perspective, Stephen Marsland, CRC Press. (ex. code in Python)
  •  Machine Learning, Hands-On for Developers and Technical Professionals, Jason Bell, Wiley. (ex. code in Java with Weka toolkit.)
  •  Machine Learning In Action, Peter Harrington, Manning. (ex. code in Python.)
  •  Thoughtful Machine Learning, a Test Driven Approach, Matthew Kirk, O'Reilly. (ex. code in Ruby.)

More programming-wise books,
Python:
  •  Mastering Machine Learning with scikit-learn, Gavin Hackeling, Packt.
  •  Learning scikit-learn: Machine Learning in Python, Raúl Garreta et.al., Packt. 
  •  scikit-learn Cookbook, Trent Hauck, Packt.
  •  Building Machine Learning Systems with Python, Willi Richert et.al, Packt.
R:
  •  An Introduction to Statistical Learning with Applications in R, Gareth James et.al, Springer.
  •  Machine Learning with R, Brett Lantz, Packt.
Scala:
  •  Scala for Machine Learning, Patrick R Nicolas, Packt.

Best ML course, with easy understandable video lectures, very well-structured:
Stanford's Prof. Andrew Ng  https://www.coursera.org/course/ml (old regular format with SoA, already closed since 2015).
New format of the course is on-demand (self-paced),  currently without SoA, https://www.coursera.org/learn/machine-learning .

----

Thursday, February 26, 2015

Handwritten Digits Recognition, Experiment with Octave's Neural Network Package "nnet", and RSNNS

This is a note on implementation of handwritten digits recognition, with the neural network learning process, by using Octave nnet package (or MATLAB neural network toolbox).

At the end,  I play around with R code and RSNNS library (Stuttgart Neural Network Simulator for R).

GitHub, Octave/MATLAB:
    https://github.com/flyingdisc/handwritten-digits-recognition-octave-nnet
Github, R - RSNNS:
    https://github.com/flyingdisc/handwritten-digits-recognition-RSNNS  
----

80-20 Rules, Pareto Principle

Wikipedia's Pareto Principle,
"The Pareto principle (also known as the 80–20 rule, the law of the vital few, and the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes."

Monday, February 9, 2015

Handling US NOAA Storm Database's Exponent Value of PROPDMGEXP and CROPDMGEXP


How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP of "StormData.csv"

 

Reproducible Research Project 2, Coursera, Johns Hopkins University

U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database

There is confusion on how to handle exponent value of PROPDMGEXP and CROPDMGEXP columns of the database. Due to lack of official information in the NOAA website.

This is an attempt to compare downloaded database with the online version, to find conclusion what is meaning of each value actually.

This analysis is inspired by a post made by David Hood, himself is CTA in the Data Science Specialization courses.

At the end of this article, there is more accurate analysis done by Eddie Song.   

GitHub PDF and Markdown repository:
 https://github.com/flyingdisc/RepData_PeerAssessment
Rpubs: http://rpubs.com/flyingdisc/PROPDMGEXP
Reproducible report of the project: http://rpubs.com/flyingdisc/RepProject2