Cassava Leaf Disease Classification with Deep Learning: Part II

lovable lazy
11 min readMar 16, 2021

Authored by Yunxuan Zeng and Siyu Shen, graduate students at the Data Science Initiative, Brown University

Introduction

From our previous post, a major cause of the loss of cassava is cassava leaf disease. The goal of this project is to build robust deep learning models to classify each Cassava image into different categories of Cassava Leaf Diseases.

From our initial blog post, we independently built a baseline model. Instead, we decided to use pre-trained VGG16 as our baseline model. The validation accuracy for this baseline model is roughly 60%. To obtain a model with better performance, we performed transfer learning and hyperparameter tuning to make robust models.

The second blog is being produced in order to document our team’s work on the Kaggle competition. Please feel free to look at codes at our team GitHub repository. We are very glad to receive all your comments and suggestions!

This blog will be divided into the following parts:

  • TFRecords
  • Image Augmentation
  • Transfer Learning and Model Performance
  • Hyperparameter Tuning
  • Possible Next Steps

TFRecords

The TFRecord format is a simple format for storing a sequence of binary records. Since this dataset is pretty large, TFRecord will have a significant impact on the training time of the model. It’s because that binary data takes less space on disk and less time to copy. Therefore, it will read more efficiently from the disk.

Since the TFRecord stores data as a sequence of binary strings, the structure of the data needs to be specified in order to upload the dataset in the TFRecord format. Several functions were built shown below to decode JPEG-encoded images to uint8 tensor, read the TFRecord format as well as load the training, validation, and testing datasets.

Image Augmentation

As mentioned before, image augmentation is a very useful technique to perform various transformations on various images to expand the original dataset and make the model more robust. In this project, one function was built to take some possible image transformations. Then the “map” function was used while getting the training dataset in order to apply image augmentations on the training dataset only. Different random transformations tried in this project such as random flipping, random crop, and random brightness were shown in the code chunk.

Transfer Learning and Model Performance

By definition, transfer learning is a special initialization technique in the machine learning area, which reuses pre-trained models and modifies the weights of these pre-trained models. This is a very popular approach, especially in deep learning. Transfer learning has several benefits. It not only enables us to use knowledge from previously learned tasks but also applies them to a new and related one. More importantly, it speeds up the training process and improves the performance of our deep learning models.

In this project, our team have tried several pre-trained models including InceptionV3, VGG16, Inception-ResNet, ResNet-50, MobileNetV2, ResNet101V2, XCeption, etc. In this blog, we will discuss our current three best models.

InceptionV3

The first pre-trained model tried in this project is InceptionV3. InceptionV3 is a widely-used image recognition model. InceptionV3 has 42 layers but the computation cost is only 2.5 times higher than that of GoogLeNet with 22 layers.[2] This model is composed of several symmetric and asymmetric building blocks shown in the figure, such as convolutions, average pooling, max-pooling, dropouts, concats, and fully connected layers [1]. The structure of the InceptionV3 is shown below:

InceptionV3 Architecture [1]

Our approach was to set it as a base model first and fix its weights and bias parameters by freezing all layers in InceptionV3. Then we added several layers on the top of the model. By running on a small number of epochs, we tried various combinations of convolutional layers, pooling layers, as well as fully connected layers. In order to get our current best results on the whole dataset, one possible optimal combination we have found was to apply GlobalMaxPooling immediately after the base model, where GlobalMaxPooling can be used in a model to aggressively summarize the presence of a feature in an image [3]. Then, we added one flatten layer to transform data into a 1-dimensional array for inputting it to one dense layer as well as one dropout layer to overcome overfitting issues. Finally, one softmax layer was applied to output our result. After turning in several parameters, we decided to go with the dropout rate of 0.5, batch size of 128, and learning rate of 0.00001. The results of the training history are shown below.

Training History of InceptionV3

From the left figure above, we can see that this model has been able to produce a consistent validation accuracy of 0.76. There are still some overfitting issues after 100 epochs, where the training accuracy is a little bit higher than the validation accuracy. In addition, we can also notice that both training accuracy and validation accuracy exhibit mild oscillation behavior for each epoch. One possible reason could be that there are too many parameters in the last layers which could cause the parameters to oscillate between two local plateaus.

From the right figure above, we can see that the training loss continues to decrease with experience, and the validation loss exhibits similar behavior as the training loss. Both of them achieve a point of stability. We are confident that this is also a sign of a good fit.

VGG16

VGG16 is regarded as one of the excellent vision model architectures where 16 indicates the total number of layers with some weights. It contains multiple convolution layers and max pool layers. Unlike other pre-trained models, VGG has many convolution layers which use 3x3 kernel filters for convolution [4]. The detailed structure of VGG16 is shown below:

VGG16 Architecture [5]

Our approach was to set it as a base model first and retrain the existing weights of the VGG16 CNN model to classify the cassava image datasets. In this way, our model would be able to apply the learning knowledge learned from the ImageNet database. Then we added several layers on the top of the model. Similar to Inception V3, we tried different combinations of convolutional layers, pooling layers, and fully connected layers in order to find out some optimal solutions. As a result, we chose to add one flatten layer first. Then we added two more dense layers and each dense layer was followed by one dropout layer to solve possible overfitting issues. Finally, one Softmax layer was applied to output our result. After tunning several parameters, we decided to go with a dropout rate of 0.5, batch size of 128, and learning rate of 0.000001. The results of the training history are shown below.

Training History of VGG16
Epoch History of VGG16

From the figures above, we can see that this model has been able to produce a consistent validation accuracy of 0.82, which is higher than that of InceptionV3. Similarly, there still exists overfitting issues, especially after 100 epochs. Moreover, we can also notice that both training accuracy and validation accuracy oscillates strongly for each epoch. One possible reason could be that we built two dense layers with 256 and 128 neurons respectively. So it causes too many parameters resulting in oscillations.

From the right figure above, we can see that the training loss continues to decrease with experience and the validation loss decreases to a point and then begins to increase. Definitely, this is a sign of overfitting. Moreover, validation loss exhibit more serious oscillation behavior than training loss.

ResNet-50

ResNet-50 is a convolutional neural network with 50 layers. From the figure below, its architecture can be split into 4 stages. It starts with a convolutional layer and max-pooling. Stage 1 starts after max-pooling and continues with 3 residual blocks containing 3 layers each. The curve represents identity connection and the dash represents residual block is performed. In the end, the network has an average pooling layer followed by a fully connected layer.

ResNet-50 Architecture [6]

Same as the approach we worked on InceptionV3, we froze all layers in ResNet-50 in order to fix its weights and parameters. After spot-checking several combinations of convolutional layers, pooling layers, as well as fully connected layers, we applied GlobalMaxPooling immediately after the base model to summarize the presence of a feature in an image. Then, we added one dense layer followed by one dropout layer to deal with overfitting. Finally, one softmax layer was applied to output our result. After tunning several parameters, we chose a dropout rate of 0.6, batch size of 128, and learning rate of 0.0001. The results of the training history are shown below.

Training History of ResNet-50

From the left figure above, we can see that this model has been able to produce a consistent validation accuracy of 0.78, which is higher than that of InceptionV3 but lower than that of VGG16. Similarly, there still exists overfitting issues, especially after 15 epochs. However, there is no serious oscillation behavior in both accuracy and loss.

From the right figure above, we can see that both training and validation loss continues to decrease with experience until a point of stability. The shape of both curves is very similar. and there is a very small gap between training loss and validation loss. Additional training makes this model a little bit overfitting where the validation loss is slightly higher than the training loss. However, it’s a sign of a good fit learning curve.

Hyperparameter Tuning

Hyperparameter Tuning is the technique of choosing the optimal hyperparameters for a learning algorithm. An ideal combination of hyperparameters can help us to enhance our model performance and make the predictions more accurate. In this project, we have done hyperparameter tuning on VGG16 and ResNet50 in that both of them have better performance than other models at the current stage.

Take VGG16 as an example, we have built a pipeline to tune the learning rate ranging from 1e-3 to 1e-7, the dropout rate ranging from 0.2 to 0.7 as well as the number of neurons in each layer.

VGG16 Learning Rate Hyperparameter Tuning

As we know, the Learning rate helps us to determine the gradient descent’s step size in the process of finding a global minimum of a loss function. A large learning rate will make us jump over minima but a small learning rate will take a long time to converge or stuck in a local minimum. From the plot above, we could see that the learning rate of 1e-5 will provide us a higher training accuracy and validation accuracy. Therefore, we chose it as our learning rate to run the model.

VGG16 Dropout Rate Hyperparameter Tuning

In addition, dropout rates could zero out some fraction of the nodes in each layer before calculating the subsequent layer. In the meantime, it amplifies the other nodes to make the hidden representations roughly unbiased. So the dropout rates help to force the network to have a redundant representation and prevent co-adaptation of features. From the figure above, we can see that a dropout rate of 0.5 and 0.6 have better performance with respect to training and validation accuracy. Therefore, we tried both of them to train our models.

VGG16 Neurons Hyperparameter Tuning

On the other hand, different initialization of the number of neurons in each layer could also have a varied performance on the result. We have tuned the number of neurons into two layers. From the figure above, we can see that combinations such as 256 and 128, 256 and 64. as well as 512 and 96 have comparatively better performance than others. We decided to go with 256 and 128 at the current stage. In the future, we will other combinations.

Possible Next Steps…

Although we have successfully built some models, the accuracy at this stage is still not high. The possible steps in the future are shown below:

  1. Hyperparameter Tuning: As we can see hyperparameter tuning plays an important role in the model performance. In the next step, we are going to tune more parameters on VGG-16, InceptionV3, and ResNet-50 in order to boost our model performance.
  2. Deal with oscillation behavior of loss and accuracy: From the plots above, both loss and accuracy are oscillating with respect to the number of epochs. In the next step, we are going to find some possible approaches to deal with it.
  3. Improve the overall accuracy: Although we have achieved a validation accuracy of around 0.80, there is still room for further improvement. In the future, besides trying different combinations of hyperparameters, we will focus on how to add extra layers to the base model to improve performance. In addition, we also want to try to combine two different powerful pre-trained models and see whether this could make more accurate predictions on the whole dataset.
  4. Overfitting Issues: For our three models, all of them have overfitting issues. So in the future, we will find out some effective strategies to deal with overfitting.
  5. Kaggle Score: Although we got a roughly validation accuracy of 0.82, we had a comparatively low score using the same model on the official Kaggle website. In the next step, we will try to figure out the reason.

Thank you for reading our blog. Please do not hesitate to contact us if you have any suggestions or concerns. We appreciate your ideas!

Reference

  1. Advanced guide to inception v3 on cloud tpu | google cloud. (n.d.). Retrieved March 15, 2021, from https://cloud.google.com/tpu/docs/inception-v3-advanced
  2. Ramcharan, Amanda & Baranowski, Kelsee & McCloskey, Peter & Ahmed, Babuali & Legg, James & Hughes, David. (2017). Using Transfer Learning for Image-Based Cassava Disease Detection. Frontiers in Plant Science. 8. 10.3389/fpls.2017.01852.
  3. Brownlee, J. (2019, July 05). A gentle introduction to pooling layers for convolutional neural networks. Retrieved March 15, 2021, from https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/
  4. Vgg16 — convolutional network for classification and detection. (2021, February 24). Retrieved March 15, 2021, from https://neurohive.io/en/popular-networks/vgg16/
  5. Google image result for https://miro.medium.com/max/850/1*_lg1i7wv1plpzp2f4mlrvw.png. (n.d.). Retrieved March 15, 2021, from https://images.app.goo.gl/m2MPPfgxrfwm5apo8
  6. Ghassemi, S., & Magli, E. (2019, June 14). Convolutional neural networks for ON-BOARD Cloud Screening. Retrieved March 15, 2021, from https://www.mdpi.com/2072-4292/11/12/1417/htm
  7. Getting Start: TPU + Cassava Leaf diseases. Retrieved March 15, 2021, from https://www.kaggle.com/jessemostipak/getting-started-tpus-cassava-leaf-disease

--

--

lovable lazy

We are lovable and lazy guys from the Data Science Initiative at Brown University.