Linear Regression in Go - Part 1

Python is becoming the de facto standard for Big Data and Machine Learning, in particular because of some amazing tools like IPython Notebook that help visualize your data or scikit learn that implement some of the most popular machine learning algorithms.

So implementing an ML algorithm in Go is a pure exercise.

What is Linear Regression

Linear regression is a supervised machine learning algorithm used to predict a continuous value; for example, it can be used to predict prices in the market.

The term supervised refers to the fact that the algorithm needs to be trained with a learning dataset; we’ll see more examples of supervised algorithms in the future.

Here is a plot of real data about house prices in Windsor, ON — X axis is lot size, Y axis is price. As we can see, bigger lots tend to cost more. The red line is our best guess at the relationship:

Windsor, ON house prices · lot size vs price · red line = hypothesis

This red line is called hypothesis function (or prediction) and looks like:

$h\_\theta(x) = \theta_0 + \theta_1 x$

where $x$ is our feature (the lot size) and the result $h_\theta(x)$ is our price prediction.

But we could have more features, like the number of bathrooms or the number of bedrooms; we can even use polynomial functions of the features. A more complicated example is:

$h\_\theta(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1 ^ 2 + \theta_3 x_2 + \theta_4 x_2 ^ 2$

Here $x_1$ is still our lot size but is now a quadratic function, and $x_2$ might be the number of bedrooms.

In this case, what the machine learning algorithm will do is find the right weights for this function to give the best results, so it will find the vector $\theta = \langle\theta_0,\theta_1,\theta_2,\theta_3,\theta_4\rangle$ .

Using Matrices and Vectors

If we arbitrarily define a new value $x_0$ to be equal to 1 we can rewrite the hypothesis function as follows:

$h\_\theta(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \displaystyle\sum\_{j=0}^{n}\theta\_j x\_j$

where $n$ is the number of features and $x$ is something like this:

$x = \begin{bmatrix} x_0 = 1 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}$

But since $x_0=1$ the two equations are equal and we can get to the vectorized format:

$h\_\theta(x) = \theta^T x$

This is not just easier to read, it’s also independent from the number of features and can benefit from computationally optimized functions like the ones you can find in packages like gonum matrix. gonum package uses BLAS and LAPACK implementations, you can find more details here.

This first post ends with the hypothesis function written in Go taking advantage of the mat64.Dot function:

func Hypothesis(x, theta *mat64.Vector) float64 {
    return mat64.Dot(x, theta)
}

You can find the whole file here and its test here.

In the next post about linear regression we’ll implement the cost function and the gradient descent, the cost function is used to measure the error of a specific set of $\theta$ , while the gradient descent is a function that will converge $\theta$ to the optimal values.

You can find part 2 here