Everything should be made as simple as possible, but not simpler. (Albert Einstein)

Sunday, October 25, 2015

Singular Value Decomposition with Numpy & Scipy

Following previous post "Singular Value Decomposition and Dimensionality, Using R...", here is another approach using Numpy and Scipy.

An example is in Latent Semantic Analysis (LSA, or Latent Semantic Indexing LSI) with Term-Document matrix. 

First data is a list of documents, second data is a list of terms. We build a matrix in which each cell represents "is term t in document d?". It is "1" if term t is found in document d, "0" otherwise. In this case, documents are the features (columns) and terms are the observations (rows).

This is usually used in NLP (Natural Language Processing) to calculate text similarity. 

Sunday, October 4, 2015

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes, by Soesilo Wijono,

SVD (singular value decomposition) is an important method used in data science, especially data mining. It can be used, e.g., in dimensionality reduction for recommender system.
Imagine online store, e.g. Amazon, to have million of items, and million of users. In order to perform algorithm for the recommender system, matrix to be used would have million by million dimension. Which is very expensive computation.
Theory for dimensionality reduction is everywhere, so we won’t repeat it again in here. Just remember the basic equation:
X = U A V.T
U matrix has dimension of n x n.
V matrix has dimension of d x d.
A matrix is diagonal matrix with dimension of n x d.
(T represents matrix transpose.)
We want to reduce the dimension of X matrix.
This is an illustration of the method by using a PNG cat image. To help understanding the method visually. In which we’ll use image raw data. In real world, the image data can be replaced by any data, e.g. items x users matrix used in an recommender system, etc.