The problem of over-fitting.
Over-fitting is the state of neural network neurons activation when they are too much adjusted to one particular training set, such a way that for that one training set the neural network shows perfect performance, while on any new data its performance is dramatically worse. It happens due to the fact that network trained with such settings that it too strictly adjust itself to one particular data only, without ability to “generalize” any other similar data, thus on any new data it shows very poor performance.
Intuitively, you can compare over-fitted neural network with tailored suite which is made for one particular person based on body measurements of that person. Obviously such “custom” suite will be perfectly fit for person’s body, it will be precisely correct in every measurement , while same suite if being produced in mass quantities will NOT be the perfect suite for any “average” person.
To fix problem of over-fitting the technique called “drop-out” is used. Dropout is a technique to randomly disable neurons on given layers which led to re-configuring/re-distributing network connections on each training step, which allows the network to not be over fitted to some particular dataset but instead be more general due to the fact that given network can not be over-optimized to some particular dataset due to randomly disabled neurons on each training step.
Dropout is NOT a “silver bullet” that will made your network performance much better, moreover on some certain datasets it might even degrade network performance, so you should check by “try and error” method if dropout improve your network performance on your particular dataset.
For the neural network architecture which I use at my datafor.art blog, the dropout is added to the network by two following lines of code (yes, just 2 lines)
from keras.layers import Dropout model.add(Dropout(p = 0.1))
Obviously, first line of code “from keras.layers import Dropout” should be located at the beginning of your code. Second line of code “model.add(Dropout(p = 0.1))” should be used for layers at which you want to have drop-out, in the python code it should be located at the next line after line that represent the neuron layer itself, for example after line “model.add(Dense(6, activation=’relu’))” from our first network example at page http://datafor.art/first-real-neural-network/
In the code “model.add(Dropout(p = 0.1))” the “p=0.1” represents the probability at which all neurons on that level will be set to zero(i.e. disabled) on each processing step, so 0.1 means 10% of probability, i.e. each neuron on given layer will be disabled(set to zero) near 10% of time (10% of all training steps/epoch)
As a result, the neural net from our first example located here, will have 2 additional lines of code and finally will look like this:
from keras.models import Sequential from keras.layers import Dense import numpy from keras.layers import Dropout # here we import dropout function from keras library # --line below to load *.csv file into memory, data columns separated by comma dataset_train = numpy.loadtxt("diabet_train.csv", delimiter=",") # here we load data into Numpy array # --two lines below to split input data into actual training data and for setting labels for each data record X_train = dataset_train[:, 0:8] # here we split data for actual training data, we cut first 8 columns Y_train = dataset_train[:, 8] # here we get labels for each record, i.e. classification for each record, #we cut latest column after first 8 training columns, i.e. we cut last column number 9 # ---start of the code for actual neural net architecture construction---- model = Sequential() model.add(Dense(8, input_dim=8, activation='relu')) model.add(Dense(6, activation='relu')) model.add(Dropout(p = 0.1)) #here we add dropout for the first hidden layer with 6 neurons, #with 10% probability to disable(set to zero) of each neuron on that given layer, during each #training step (each training epoch) model.add(Dense(3, activation='relu')) model.add(Dense(1, activation='sigmoid')) # ---end of actual neural net architecture----- model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy']) model.fit(X_train, Y_train, epochs=5000, batch_size=500) # here actual trainig happens model.save('my_model.h5') # here we save on HDD/SSD disk our model and all "weights"/neurons after training # evaluate the model on training data(not the same performance as for any new given data that used for actual prediction) scores = model.evaluate(X_train, Y_train) print("\n%s: %.2f%%" % (model.metrics_names, scores * 100))
Usually the dropout ratio could be 10%-30%, the simpest way of optimization is to start from 20% dropout and than play with this parameter to reach best network performance. Again, you could play with this value – Dropout(p = 0.1)
model.add(Dropout(p = 0.1)) # try values in 0.1 - 0.3 range
You could add dropout for each network layer except last output leyer.