The Intuition Behind Perceptron

Using geometry to understand the artificial neurons

7 min readJun 18, 2021

The Basics

An artificial neuron is a type of perceptron, that receives inputs, processes these inputs, and then decides whether it should fire or not fire ( 1 or 0 ). The inputs in an artificial neuron can be represented by vector x, where;

Each input has a corresponding weight assigned to it, which signifies the importance of that input, the weights can be represented by vector w, where;

Consider the task of admitting (1) or rejecting (0) a student applying to college A, our inputs would be the student’s academic feature, like the GRE score and their high school grade. Each feature has a weight assigned to it. Let’s say college A cares about the GRE score more than the high school grade, in other words, the corresponding weight w₁ of the GRE score x₁ will have a higher value than the weight w₂ corresponding to the high school grade x₂.

Perceptron rule

We can construct a rule from these variables, if the weighted sum is larger than the threshold; the student gets admitted, else rejected :

However the above formula only considers 2 inputs, what if we had more than 2?. In this case we can represent the condition as a dot product between vector 𝑥 and vector 𝑤 where:

and we can take the threshold to the other side of inequality and replace it with 𝑏 where;

And now the perceptron rule looks like this:

The 𝑏 variable is known as the bias, and it decides how easy is it for a perceptron to output 1, if we had:

then it will decide how easy is it to output a 0, however the previous notation is most commonly used and is the standard for defining the perceptron rule. Another way to look at the rule is to think of a line/hyperplane or a decision boundary, that separates between two classes (admitted or rejected ).

One thing to keep in mind is that vector x is known to us, and vector w is what we want the perceptron to tell us. Returning to our example, assume that we don't know the function that college A uses to for their admission, instead we have the data of all the admitted and rejected students. We could use the perceptron rule to construct a decision boundary (function), that separates the admitted students from the rejected. to put it another way, let the perceptron use the data (GRE, school grade) as inputs to learn the set of weights that would best represent a linear split of the given data.

The Bias

The shaded green area represents the students’ GRE and high school scores that got them admitted to college A, with high values of 𝑏 this shaded area gets larger, and the majority of the students lie inside the admitted area. In other words it is easier for a perceptron to output a 1 (easier to get admitted because the admission area encompasses a wider range of values for both GRE and high school grade).

Similarly for small values of 𝑏 the admitted area gets smaller, and hence the majority of students lie inside the rejected area. In other words it will be harder for a perceptron to output a 1 (harder to get admitted because the admission area encompasses a small range of values for both GRE and high school grade).

Now this is a hypothetical decision boundary obviously student scores can not be negative, but the same logic still applies. Additionally when working in an n-dimensional space, the same logic still applies, instead of a line you have a hyperplane.

The weights

Line passing through the origin

If we construct a decision boundary that passes through the origin, the value of 𝑏 must equal to 0. This way we can understand what our weights represent graphically, and then generalize it for values where 𝑏 is not 0. For simplicity we will stick to 2D space (GRE and high school grade).

We previously discussed that we can represent the weighted sum as the dot product between vector x and w;

If the dot product of two non zero vectors equal to 0, then these vectors are perpendicular to each other. Let w₁ and w₂ = 1, then;

A vector that is perpendicular to a surface is also known as the normal vector, and for any vector x that satisfices the equation above, will form a 90° angle with w (our normal vector). The normal vector w is pointing towards the admitted students (1) and is always perpendicular to the direction of the line.

Weight vector pointing towards positive class

Student A with a high school grade of 4 and GRE score 2 (4, 2) will form an angle < 90° with w. In general for any vector x that forms an angle < 90° will lie in the admitted area, similarly for any vector x that forms an angle > 90° will lie in the rejected area, and for any vector x that forms an angle = 90° will lie exactly on the line.

≈ 18.2° between weight vector and vector x

≈ 108.7° between weight vector and vector x

We can conclude that depending on the angle between the weight vector (normal vector) and the inputs, a perceptron will output a 0 or a 1. 0 if the angle is > 90° and 1 if the angle is < 90°. Now let’s give 𝑏 a value that is none zero, and try to conclude the same logic.

Line not passing through the origin

The new line has 𝑏 = -2, and w₁ and w₂ = 1;

If you compare the previous equation with the new one, you’ll notice that both these equations are parallel to each other. We can still represent this same notation by introducing a new dimension x₀ that is has a constant value of 1 and w₀ = 𝑏 .

The illustration on the left is the equation of the line passing through the origin, and the right is the new equation x₁+ x₂ − 2 = 0

equations after adding new dimension ***x₀ = 1 blue axis***

We shifted the new line 2 units upward only, in other words we did not change the orientation (slope)of our plane, and since the weight vector is perpendicular to the direction of the previous line, then this weight vector must also be perpendicular to the direction of new line, because all planes with same normal vectors are parallel. Similarly the weight vector will be perpendicular to the input vectors that satisfy the above equation.

This video and this from khan academy, does an amazing job at showing mathematically that any point on the plane is perpendicular to the normal vector.

Conclusion

We saw geometrically that the bias decides how easy is it for an artificial neuron to output a 1.
We saw geometrically that the angle between the weight vector and the inputs decides whether the neuron will output a 1 or a 0.
The goal of the perceptron is to find a set of weights that change the orientation (slope) of a line/plane to best fit the data at hand.