Everything should be made as simple as possible, but not simpler. (Albert Einstein)
Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Thursday, December 8, 2016

Probabilistic Graphical Models: Bayesian Networks Example With R, Python, SAMIAM

Here I try to practice examples of Bayesian networks as explained in the book written by Stanford Professor Daphne Koller, and Hebrew Professor Nir Friedman., “Probabilistic Graphical Models: Principle and Techniques”, chapter 3.2.   

In the book, the final values of probabilities is given as it is, without derivation of the formulas. Here I tried to expand the formulas, as well to verify the results with the help of R, Python and SamIam.

Course: PGM Specialization (MATLAB / Octave). 
Tools: R, Python, SamIam.  

The DAG of student BN example is, 

 


Sunday, October 4, 2015

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes, by Soesilo Wijono,

SVD (singular value decomposition) is an important method used in data science, especially data mining. It can be used, e.g., in dimensionality reduction for recommender system.
Imagine online store, e.g. Amazon, to have million of items, and million of users. In order to perform algorithm for the recommender system, matrix to be used would have million by million dimension. Which is very expensive computation.
Theory for dimensionality reduction is everywhere, so we won’t repeat it again in here. Just remember the basic equation:
X = U A V.T
U matrix has dimension of n x n.
V matrix has dimension of d x d.
A matrix is diagonal matrix with dimension of n x d.
(T represents matrix transpose.)
We want to reduce the dimension of X matrix.
This is an illustration of the method by using a PNG cat image. To help understanding the method visually. In which we’ll use image raw data. In real world, the image data can be replaced by any data, e.g. items x users matrix used in an recommender system, etc.

Monday, February 9, 2015

Handling US NOAA Storm Database's Exponent Value of PROPDMGEXP and CROPDMGEXP


How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP of "StormData.csv"

 

Reproducible Research Project 2, Coursera, Johns Hopkins University

U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database

There is confusion on how to handle exponent value of PROPDMGEXP and CROPDMGEXP columns of the database. Due to lack of official information in the NOAA website.

This is an attempt to compare downloaded database with the online version, to find conclusion what is meaning of each value actually.

This analysis is inspired by a post made by David Hood, himself is CTA in the Data Science Specialization courses.

At the end of this article, there is more accurate analysis done by Eddie Song.   

GitHub PDF and Markdown repository:
 https://github.com/flyingdisc/RepData_PeerAssessment
Rpubs: http://rpubs.com/flyingdisc/PROPDMGEXP
Reproducible report of the project: http://rpubs.com/flyingdisc/RepProject2