In this post I will describe the network architecture used in our first neural network .
How does neural network works?
There are dozens of different neural network types. For simplicity, let’s discuss just one simple type.
Python code below:
model = Sequential() model.add(Dense(8, input_dim=8, activation='relu')) model.add(Dense(6, activation='relu')) model.add(Dense(3, activation='relu')) model.add(Dense(1, activation='sigmoid'))
Visual representation of the given network is here:
This is fully-connected, sequential neural network model. It has input layer with 8 neurons, output(last) layer with 1 neuron and activation function ‘sigmoid’ type, and two layers between them. Second layer with 6 neuron and third layer with 3 neurons are located between first and last layers and so called “hidden” layers.
First 3 layers of our network has activation function ReLU (activation=’relu’). ReLU stands for rectified linear unit, and is a type of activation function(other types exists as well), mathematically it is defined as y = max(0, x). ReLU is linear (identity) for all positive values, and zero for all negative values. ReLU is the most commonly used activation function in neural networks, especially in CNNs. If you are unsure what activation function to use in your network, ReLU is usually a good first choice. Visually, it looks like the following:
Last layer, the one that has just one neuron, has activation function type “sigmoid” .
The sigmoid function gives an ‘S’ shaped curve.
This curve has a finite limit of:
‘0’ as x approaches −∞
‘1’ as x approaches +∞
The output of sigmoid function when x=0 is 0.5 . Thus, if the output is more than 0.5 , we can classify the outcome as 1 (or YES) and if it is less than 0.5 , we can classify it as 0(or NO) .
For example: If the output is 0.65, we can say in terms of probability as:
“There is a 65 percent chance that based on your medical data you have diabetes desease” .
Thus the output of the sigmoid function can not be just used to classify YES/NO, it can also be used to determine the probability of YES/NO.
Visually, “sigmoid activation function” looks like the following:
So, after passing input data throught several layers, including first layer and several hidden layers, we decide on last layer with help of “sigmoid” activation, if passed data closer to 1(Yes) or 0(No). “Sigmoid” activation function used in network architectures where we have to predict only 2 states, either data closer to 1(Yes, means probability above 0.5) or closer to 0(No, means probability is below 0.5). Sigmoid usually not used in the network architecture where several output states(several labels) are possible.