№ xlii The Almanac of GST · EN IT

Enrico·rubbo.li

Tech · Biohack · Markets · Opinions Enrico Rubboli, propr. Dubai, UAE
← I · Writings
essay Dec 1, 2015 3 min

Linear Regression in Go - Part 3

In the previous posts we talked about how to predict a continuous value using a linear function and a way to measure the error given a matrix of test data and a hypothesis set of values theta.

In this post we’re describing a function that will converge the vector theta to its optimal values (or local minimum) which is called gradient descent.

Just to keep things simple, let’s assume we have a vector θ\theta with just 2 dimensions (slope and intercept values of a simple line). If we plot the graph of θ0\theta_0, θ1\theta_1 and the result of the cost function we’ll have something like:

Cost Function Graph

Given any random starting point θ0,θ1\langle\theta_0,\theta_1\rangle if we calculate the partial derivative of the function we’ll have a direction pointing away from the local minimum, so we can move a step closer (α\alpha) toward the opposite direction. This is our gradient descent in mathematical terms:

θj:=θjαθjJ(θ0,θ1)\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)

The step we take every iteration toward the local minimum α\alpha is called learning rate (or sometimes step size ). For now let’s assume we use a fixed value and the same is for the number of iterations we need to perform in order to get to the local minimum.

Deriving this formula results in the following:

repeat until convergence: {θ0:=θ0α1mi=1m(hθ(xi)yi)θ1:=θ1α1mi=1m((hθ(xi)yi)xi)}\begin{align*} \text{repeat until convergence: } \lbrace & \\ \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x_{i}) - y_{i}) \\ \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x_{i}) - y_{i}) x_{i}\right) \\ \rbrace & \end{align*}

or in a more general way (if we assume x0(i)=1x_0^{(i)}=1 ):

repeat until convergence:  {  θj:=θjα1mi=1m(hθ(x(i))y(i))xj(i)  for j := 0..n}\begin{align*} & \text{repeat until convergence:} \; \lbrace \\ \; & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \; & \text{for j := 0..n} \\ & \rbrace \end{align*}

But what we want to implement is a vectorized version of that formula, which is:

θ:=θαmXT(Xθy)\theta := \theta - \frac{\alpha}{m} X^{T} (X\theta - \vec{y})

Again, using vectors, the result is much more readable and easier to implement. We can finally get to our actual go implementation:

// m = Number of Training Examples
// n = Number of Features
m, n := X.Dims()
h := mat64.NewVector(m, nil)
new_theta := mat64.NewVector(n, nil)
partials := mat64.NewVector(n, nil)

for i := 0; i < numIters; i++ {
        h.MulVec(X, new_theta)
        for el := 0; el < m; el++ {
            val := (h.At(el, 0) - y.At(el, 0)) / float64(m)
            h.SetVec(el, val)
        }
        partials.MulVec(X.T(), h)

        // Update theta values
        for el := 0; el < n; el++ {
            new_val := new_theta.At(el, 0) - (alpha * partials.At(el, 0))
            new_theta.SetVec(el, new_val)
        }
}

h is a Dense Matrix Vector and numIters is a constant int. You can find the full implementation here, and tests here.

A good visual explanation is in the following video by prof. Alexander Ihler:

Edit (feb 6 2016): I’ve fixed the code so now h and new_theta are of type Vector instead of DenseMatrix.