[Personal Notes]: Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes, by Soesilo Wijono,

SVD (singular value decomposition) is an important method used in data science, especially data mining. It can be used, e.g., in dimensionality reduction for recommender system.
Imagine online store, e.g. Amazon, to have million of items, and million of users. In order to perform algorithm for the recommender system, matrix to be used would have million by million dimension. Which is very expensive computation.
Theory for dimensionality reduction is everywhere, so we won’t repeat it again in here. Just remember the basic equation:

X = U A V.T

U matrix has dimension of n x n.
V matrix has dimension of d x d.
A matrix is diagonal matrix with dimension of n x d.
(T represents matrix transpose.)
We want to reduce the dimension of X matrix.
This is an illustration of the method by using a PNG cat image. To help understanding the method visually. In which we’ll use image raw data. In real world, the image data can be replaced by any data, e.g. items x users matrix used in an recommender system, etc.

require(ripa)
require(png)
options(warn=-1)

img = readPNG('tcat.png')
img = imagematrix(img, type="rgb")
plot(img)

This is the X matrix, img. Dimension of the matrix is,

dim(img)

## [1] 150 300   3

Let’s perform SVD, in this case we have 3D matrix, since the image has RGB format, those are red channel, green channel, blue channel.

red = img[1:nrow(img), 1:ncol(img), 1]
green = img[1:nrow(img), 1:ncol(img), 2]
blue = img[1:nrow(img), 1:ncol(img), 3]
s_red = svd(red) 
s_green = svd(green) 
s_blue = svd(blue)

Dimension of each result is,

dim(s_red$u)

## [1] 150 150

dim(s_red$v)

## [1] 300 150

length(s_red$d)

## [1] 150

There is 150 singular values.
Declare helper function to reduce the singular values,

reduce = function(s, d) {
  u = as.matrix(s$u[, 1:d])
  v = as.matrix(s$v[, 1:d])
  # Create the diagonal matrix A,  
  A = as.matrix(diag(s$d)[1:d, 1:d])
  return(list(u=u, v=v, A=A))
}

Reduce each channel matrix dimension, from 150 down to 30, which is 20%:

newdim = 30
new_R = reduce(s_red, newdim) 
new_G = reduce(s_green, newdim) 
new_B = reduce(s_blue, newdim)

Dimension of the reduced singular values,

dim(new_R$u)

## [1] 150  30

dim(new_R$v)

## [1] 300  30

dim(new_R$A)

## [1] 30 30

Reconstruction of the “dimensionality reduced” image,

newimg = array(0, dim=c(nrow(img), ncol(img), 3)) 

new_R = new_R$u %*% new_R$A %*% t(new_R$v) 
new_G = new_G$u %*% new_G$A %*% t(new_G$v) 
new_B = new_B$u %*% new_B$A %*% t(new_B$v) 

newimg[1:nrow(img), 1:ncol(img), 1] = new_R
newimg[1:nrow(img), 1:ncol(img), 2] = new_G
newimg[1:nrow(img), 1:ncol(img), 3] = new_B

newimg = imagematrix(newimg, type="rgb")
plot(newimg)

Reduce even more, from 150 down to 20, that is around 13%:

newdim = 20 
new_R = reduce(s_red, newdim) 
new_G = reduce(s_green, newdim) 
new_B = reduce(s_blue, newdim)

Dimension of the reduced singular values,

dim(new_R$u)

## [1] 150  20

dim(new_R$v)

## [1] 300  20

dim(new_R$A)

## [1] 20 20

Reconstruction of the “dimensionality reduced” image,

newimg = array(0, dim=c(nrow(img), ncol(img), 3)) 

new_R = new_R$u %*% new_R$A %*% t(new_R$v) 
new_G = new_G$u %*% new_G$A %*% t(new_G$v) 
new_B = new_B$u %*% new_B$A %*% t(new_B$v) 

newimg[1:nrow(img), 1:ncol(img), 1] = new_R
newimg[1:nrow(img), 1:ncol(img), 2] = new_G
newimg[1:nrow(img), 1:ncol(img), 3] = new_B

newimg = imagematrix(newimg, type="rgb")
plot(newimg)

Conclusion:

We see that with only 13% - 20% of original dimension we are able to reproduce approximation of the data. Dimensionality reduction also removes noises and linear dependent elements by choosing only the most important singular values.

Example with user - movie ratings:

This is a small example provided by the Stanford’s course MMDS,

# user - movie ratings matrix: 
# columns represents movies, rows represents users 
xc = c(1, 1, 1, 0, 0,
       3, 3, 3, 0, 0,
       4, 4, 4, 0, 0,
       5, 5, 5, 0, 0,
       0, 2, 0, 4, 4,
       0, 0, 0, 5, 5,
       0, 1, 0, 2, 2)
x = matrix(xc, nrow=7, ncol=5, byrow=T)
# SVD 
s = svd(x)

Left singular vectors,

s$u

##             [,1]        [,2]        [,3]          [,4]          [,5]
## [1,] -0.13759913 -0.02361145 -0.01080847  3.054209e-01  4.467149e-01
## [2,] -0.41279738 -0.07083435 -0.03242542  8.345818e-01 -4.642400e-02
## [3,] -0.55039650 -0.09444581 -0.04323389 -1.701251e-01 -7.266700e-01
## [4,] -0.68799563 -0.11805726 -0.05404236 -4.257332e-01  5.198474e-01
## [5,] -0.15277509  0.59110096  0.65365084 -1.420305e-17 -4.556360e-17
## [6,] -0.07221651  0.73131186 -0.67820922  3.553473e-17 -1.169854e-16
## [7,] -0.07638754  0.29555048  0.32682542 -7.101524e-18 -6.441516e-17

Right singular vectors,

s$v

##             [,1]        [,2]       [,3]          [,4]          [,5]
## [1,] -0.56225841 -0.12664138 -0.4096675 -7.071068e-01  0.000000e+00
## [2,] -0.59285990  0.02877058  0.8047915 -4.081479e-16  5.384491e-17
## [3,] -0.56225841 -0.12664138 -0.4096675  7.071068e-01 -5.384491e-17
## [4,] -0.09013354  0.69537622 -0.0912571 -4.130810e-17  7.071068e-01
## [5,] -0.09013354  0.69537622 -0.0912571 -4.130810e-17 -7.071068e-01

Singular values, to diagonal matrix,

diag(s$d)

##          [,1]     [,2]    [,3]         [,4]         [,5]
## [1,] 12.48101 0.000000 0.00000 0.000000e+00 0.000000e+00
## [2,]  0.00000 9.508614 0.00000 0.000000e+00 0.000000e+00
## [3,]  0.00000 0.000000 1.34556 0.000000e+00 0.000000e+00
## [4,]  0.00000 0.000000 0.00000 2.885057e-16 0.000000e+00
## [5,]  0.00000 0.000000 0.00000 0.000000e+00 7.362016e-33

Reduced dimension from 5 down to 2,

new_s = reduce(s, 2)
new_s$u

##             [,1]        [,2]
## [1,] -0.13759913 -0.02361145
## [2,] -0.41279738 -0.07083435
## [3,] -0.55039650 -0.09444581
## [4,] -0.68799563 -0.11805726
## [5,] -0.15277509  0.59110096
## [6,] -0.07221651  0.73131186
## [7,] -0.07638754  0.29555048

new_s$v

##             [,1]        [,2]
## [1,] -0.56225841 -0.12664138
## [2,] -0.59285990  0.02877058
## [3,] -0.56225841 -0.12664138
## [4,] -0.09013354  0.69537622
## [5,] -0.09013354  0.69537622

new_s$A

##          [,1]     [,2]
## [1,] 12.48101 0.000000
## [2,]  0.00000 9.508614

Reconstruction,

new_x = new_s$u %*% new_s$A %*% t(new_s$v)

Compare both x and new_x, element by element:

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    0    0
## [2,]    3    3    3    0    0
## [3,]    4    4    4    0    0
## [4,]    5    5    5    0    0
## [5,]    0    2    0    4    4
## [6,]    0    0    0    5    5
## [7,]    0    1    0    2    2

new_x

##            [,1]      [,2]       [,3]         [,4]         [,5]
## [1,]  0.9940420 1.0117044  0.9940420 -0.001327193 -0.001327193
## [2,]  2.9821261 3.0351133  2.9821261 -0.003981578 -0.003981578
## [3,]  3.9761681 4.0468178  3.9761681 -0.005308770 -0.005308770
## [4,]  4.9702101 5.0585222  4.9702101 -0.006635963 -0.006635963
## [5,]  0.3603133 1.2921647  0.3603133  4.080263014  4.080263014
## [6,] -0.3738507 0.7344294 -0.3738507  4.916721417  4.916721417
## [7,]  0.1801567 0.6460824  0.1801567  2.040131507  2.040131507

Frobenius (Euclidean) norm:

norm(x-new_x, type="F")

## [1] 1.34556

References:

-Stanford course’s Mining Massive Datasets, by Jure Leskovec, Anand Rajaraman, Jeff Ullman
-Data Mining Algorithms In R/Dimensionality Reduction/Singular Value Decomposition wikibooks.

Sunday, October 4, 2015

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes

Singular Value Decomposition and Dimensionality Reduction, Using R and Cat Image for Illustration Purposes, by Soesilo Wijono,

Conclusion:

Example with user - movie ratings:

References: