Step 2 – how two children solve the XOR problem every day
Let's see how two children solve the XOR problem using a plain everyday example. I strongly recommend this method. I have taken very complex problems, broken them down into small parts to children's level, and often solved them in a few minutes. Then, you get the sarcastic answer from others such as Is that all you did? But, the sarcasm vanishes when the solution works over and over again in high-level corporate projects.
First, let's convert the XOR problem into a candy problem in a store. Two children go to the store and want to buy candy. However, they only have enough money to buy one pack of candy. They have to agree on a choice between two packs of different candy. Let's say pack one is chocolate and the other is chewing gum. Then, during the discussion between these two children, 1 means yes, 0 means no. Their budget limits the options of these two children:
- Going to the store and not buying any of chocolate or chewing gum = no, no (0,0). That's not an option for these children! So the answer is false.
- Going to the store and buying both chocolate and chewing gum = yes, yes (1,1). That would be fantastic, but that's not possible. It's too expensive. So, the answer is unfortunately false.
- Going to the store and either buying chocolate or chewing gum = (1,0 or 0,1) = yes or no/no or yes. That's possible. So, the answer is true.
Sipping my coffee in 1969, I imagine the two children. The eldest one is reasonable. The younger one doesn't know really how to count yet and wants to buy both packs of candy.
I decide to write that down on my piece of paper:
- x1 (eldest child's decision yes or no, 1 or 0) * w1 (what the elder child thinks).
The elder child is thinking this, or:
The elder child weighs a decision like we all do every day, such as purchasing a car (x=0 or 1) multiplied by the cost (w1).
- x2 (the younger child's decision yes or no, 1 or 0) * w3 (what the younger child thinks). The younger child is also thinking this, or:
Theory: x1 and x2 are the inputs. h1 and h2 are neurons (the result of a calculation). Since h1 and h2 contain calculations that are not visible during the process, they are hidden neurons. h1 and h2 thus form a hidden layer.
Now I imagine the two children talking to each other.
Hold it a minute! This means that now each child is communicating with the other:
- x1 (the elder child) says w2 to the younger child. Thus w2 = this is what I think and am telling you:
- x2 (the younger child) says please add my views to your decision, which is represented by: w4
I now have the first two equations expressed in high-school-level code. It's what one thinks + what one says to the other asking the other to take that into account:
h1=(x1*w1)+(x2*w4) #II.A.weight of hidden neuron h1
h2=(x2*w3)+(x1*w2) #II.B.weight of hidden neuron h2
h1 sums up what is going on in one child's mind: personal opinion + other child's opinion.
h2 sums up what is going on in the other child's mind and conversation: personal opinion + other child's opinion.
Theory. The calculation now contains two input values and one hidden layer. Since in the next step we are going to apply calculations to h1 and h2, we are in a feedforward neural network. We are moving from the input to another layer, which will lead us to another layer, and so on. This process of going from one layer to another is the basis of deep learning. The more layers you have, the deeper the network is. The reason h1 and h2 form a hidden layer is that their output is just the input of another layer.
I don't have time to deal with complicated numbers in an activation function such as logistic sigmoid, so I decide to simply decide whether the output values are less than 1 or not:
if h1+h2 >=1 then y1=1
if h1+h2<1 then y2=0
Theory: y1 and y2 form a second hidden layer. These variables can be scalars, vectors, or matrices. They are neurons.
Now, a problem comes up. Who is right? The elder child or the younger child?
The only way seems to be to play around, with the weights W representing all the weights.
I decided that at this point, I liked both children. Why would I have to hurt one of them? So from now on, w3=w2,w4=w1. After all, I don't have a computer and my time travel window is consuming a lot of energy. I'm going to be pulled back soon.
Now, somebody has to be an influencer. Let's leave this hard task to the elder child. The elder child, being more reasonable, will continuously deliver the bad news. You have to subtract something from your choice, represented by a minus (-) sign.
Each time they reach the point hi, the eldest child applies a critical negative view on purchasing packs of candy. It's -w of everything comes up to be sure not to go over the budget. The opinion of the elder child is biased, so let's call the variable a bias, b1. Since the younger child's opinion is biased as well, let's call this view a bias too b2. Since the eldest child's view is always negative, -b1 will be applied to all of the eldest child's thoughts.
When we apply this decision process to their view, we obtain:
Then, we just have to use the same result. If the result is >=1 then the threshold has been reached. The threshold is calculated as shown in the following function.
Since I don't have a computer, I decide to start finding the weights in a practical manner, starting by setting the weights and biases to 0.5, as follows:
It's not a full program yet, but its theory is done.
Only the communication going on between the two children is making the difference; I focus on only modifying w2 and b1 after a first try. An hour later, on paper, it works!
I can't believe that this is all there is to it. I copy the mathematical process on a clean sheet of paper:
Solution to the XOR implementation with
a feedforward neural network(FNN)
I.Setting the first weights to start the process
w1=0.5;w2=0.5;b1=0.5
w3=w2;w4=w1;b2=b1
#II hidden layer #1 and its output
h1=(x1*w1)+(x2*w4) #II.A.weight of hidden neuron h1
h2=(x2*w3)+(x1*w2) #II.B.weight of hidden neuron h2
#III.threshold I, hidden layer 2
if(h1>=1):h1=1;
if(h1<1):h1=0;
if(h2>=1):h2=1
if(h2<1):h2=0
h1= h1 * -b1
h2= h2 * b2
IV.Threshold II and Final OUTPUT y
y=h1+h2
if(y>=1):y=1
if(y<1):y=0
V. Change the critical weights and try again until a solution is found
w2=w2+0.5
b1=b1+0.5
I'm overexcited by the solution. I need to get this little sheet of paper to a newspaper to get it published and change the course of history. I rush to the door, open it but find myself back in the present! I jump and wake up in my bedroom sweating.
I rush to my laptop while this time-travel dream is fresh in my mind to get it into Python for this book.
Why wasn't this deceiving simple solution found in 1969? Because it seems simple today but wasn't so at that time like all inventions found by our genius predecessors. Nothing is easy at all in artificial intelligence and mathematics.