Before saving your paper documents in a product like Sismics Docs, it’s always nice to evaluate the quality of the scans.
For one of our projects, we had to do this automatic evaluation using a neural network. The obvious framework for machine learning these days is Keras. All the following code is working with Keras 2.0 and Tensorflow as a backend.
To follow this article, you will need:
- A machine with Python 3, Keras 2 and Tensorflow installed
- Preferably a configured Nvidia GPU to speed up the learning process. We used a GTX 970 and the training time was only a few minutes for 100 epochs
- Some stained and clean documents
The full working code is in this Github repository: https://github.com/sismics/keras-neural-net-image-classification-stain
We decided to train our model using “homemade” data, so we didn’t have a lot of data a our disposal. We simply took a few clean documents and did the dirty work ourselves 😉
Once this was done, we scanned those dirty documents, and some clean ones, and sliced those scans in 10000 64×64 images. The slicing is there to increase the input data volume, and decrease the input image resolution. After some testing, 5000 images seems like the minimum you need to achieve some accuracy, but as always the more the better. Then, we manually classified those images in two folders “stain” and “clean”.
To reduce overfitting of our model, and increase the variations in our images, we used the Keras image data generator.
train_datagen = ImageDataGenerator( rescale=1./255, horizontal_flip=True, vertical_flip=True, fill_mode='nearest', rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, zoom_range=0.2 ) train_generator = train_datagen.flow_from_directory( 'data/train', # this is the target directory target_size=(64, 64), # all images will be resized to 64x64 batch_size=batch_size, color_mode='grayscale', class_mode='binary')
This generator takes our images and do random changes like zooming, rotating, … and then feed it in the model during training.
The model used has 3 convolution layers with a ReLU activation and followed by max-pooling layers, as recommended in this official Keras article.
model = Sequential() model.add(Conv2D(32, (3, 3), input_shape=(64, 64, 1))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(64)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Training the model was quite fast using a GPU, and after 100 epochs we got 85%+ accuracy on our validation data. The validation accuracy is greater than the training accuracy and the validation loss is lower than the training loss, a good sign of non-overfitting of our model.
The end result gives quite good information about the quality of our input scan.
After that, we developped a small interface to test our trained model using Bootstrap 4, Vue.js and Flask as a backend.
As further improvements, we could think of:
- Obviously add more data, more kind of stains, more edge cases
- Tweak the hyperparameters to achieve better accuracy
- We explicitely chose to grayscale our images, but maybe keeping the color information is a better idea