Explanation

The perceptron algorithm, first introduced by Frank Rosenblatt, is a linear classifier.
This means that for a given input \(x \in \mathbb{R}^n\) the perceptron assigns an output label \(y \in \{-1,1\}\) based on a linear function of the form:\begin{align} f(x) &= \text{sgn}(w_0 + w_1x_1 + w_2x_2 +…+ w_nx_n) \end{align} or written more concisely as:\begin{align} f(x) &= \text{sgn}(w^Tx + w_0) \end{align} This function represents a hyperplane which divides the space into two halves. This still leaves the question, “How do we learn this hyperplane?” Well we want to do something like (from now on \(w_t\) will represent \(w\) at time \(t\) not at position \(t\) in the vector):

Initialize \(w_0 = \vec{0} \in \mathbb{R}^n\)
For each training example \((x_i,y_i)\):
- Predict \(y' = \text{sgn}(w_t^Tx_i)\)
- if \(y_i \neq y'\):
  - Update \(w_{t+1} = w_t +\)something

This is all fine and dandy but how do we update the weights? Well we want to adjust them based on how bad the error of the classification was on a given input i.e. \(yx\). We can read this as if we made an error on input \(x\) we need to adjust our direction in the direction of \(y\).We also want the learning to happen smoothly. In other words we don’t want the perceptron moving wildly everytime it makes an error we want it to take small steps. Thus we multiply \(yx\) by some constant \(\eta\) usually less than one. (In the below example \(\eta = .01\)) Putting this all together we see that the update rule is: \begin{align} w_{t+1} &= w_t + \eta yx \end{align}

I’ve created a visualization to demonstrate how the algorithm updates as new examples are seen. In the below demonstration points are randomly generated and assigned a label according to the true function (black line). Points are positive if they are to the right of the line and negative otherwise.

The perceptron makes three complete passes over the data before reseting the visualization.
The perceptron is learning online each time a new point appears the algorithm re-classifies all previously seen points. Every time it makes an error it performs an additive update which can be seen when the red line moves.

Figure Key

Black Line - True decision boundary from which data was labeled
Red Line - Decision boundary learned by perceptron
”+” - Points classified as positive by the perceptron
”-“ - points classified as negative by the perceptron