№ xlii The Almanac of GST · EN IT

Enrico·rubbo.li

Tech · Biohack · Markets · Opinions Enrico Rubboli, propr. Dubai, UAE
← I · Writings
tech Oct 1, 2015 3 min

Linear Regression in Go - Part 1

Python is becoming the de facto standard for Big Data and Machine Learning, in particular because of some amazing tools like IPython Notebook that help visualize your data or scikit learn that implement some of the most popular machine learning algorithms.

So implementing an ML algorithm in Go is a pure exercise.

What is Linear Regression

Linear regression is a supervised machine learning algorithm used to predict a continuous value; for example, it can be used to predict prices in the market.

The term supervised refers to the fact that the algorithm needs to be trained with a learning dataset; we’ll see more examples of supervised algorithms in the future.

Here is a plot of real data about house prices in Windsor, ON — X axis is lot size, Y axis is price. As we can see, bigger lots tend to cost more. The red line is our best guess at the relationship:

Windsor, ON house prices · lot size vs price · red line = hypothesis

This red line is called hypothesis function (or prediction) and looks like:

h_θ(x)=θ0+θ1xh\_\theta(x) = \theta_0 + \theta_1 x

where xx is our feature (the lot size) and the result hθ(x)h_\theta(x) is our price prediction.

But we could have more features, like the number of bathrooms or the number of bedrooms; we can even use polynomial functions of the features. A more complicated example is:

h_θ(x)=θ0+θ1x1+θ2x12+θ3x2+θ4x22h\_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1 ^ 2 + \theta_3 x_2 + \theta_4 x_2 ^ 2

Here x1x_1 is still our lot size but is now a quadratic function, and x2x_2 might be the number of bedrooms.

In this case, what the machine learning algorithm will do is find the right weights for this function to give the best results, so it will find the vector θ=θ0,θ1,θ2,θ3,θ4\theta = \langle\theta_0,\theta_1,\theta_2,\theta_3,\theta_4\rangle.

Using Matrices and Vectors

If we arbitrarily define a new value x0x_0 to be equal to 1 we can rewrite the hypothesis function as follows:

h_θ(x)=θ0x0+θ1x1+θ2x2+θ3x3+θ4x4=_j=0nθ_jx_jh\_\theta(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \displaystyle\sum\_{j=0}^{n}\theta\_j x\_j

where nn is the number of features and xx is something like this:

x=[x0=1x1xn]x = \begin{bmatrix} x_0 = 1 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}

But since x0=1x_0=1 the two equations are equal and we can get to the vectorized format:

h_θ(x)=θTxh\_\theta(x) = \theta^T x

This is not just easier to read, it’s also independent from the number of features and can benefit from computationally optimized functions like the ones you can find in packages like gonum matrix. gonum package uses BLAS and LAPACK implementations, you can find more details here.

This first post ends with the hypothesis function written in Go taking advantage of the mat64.Dot function:

func Hypothesis(x, theta *mat64.Vector) float64 {
    return mat64.Dot(x, theta)
}

You can find the whole file here and its test here.

In the next post about linear regression we’ll implement the cost function and the gradient descent, the cost function is used to measure the error of a specific set of θ\theta, while the gradient descent is a function that will converge θ\theta to the optimal values.

You can find part 2 here