Cassava Leaf Disease Classification with Deep Learning: Part I

Authored by Yunxuan Zeng, Siyu Shen and Yuchen Hua, graduate students at the Data Science Initiative, Brown University

lovable lazy
9 min readFeb 22, 2021

Do you know the Cassava? Cassava is an important crop that provides a basic diet for billion people in Africa. It is a major food source for the local farmers since it can be cultivated under severe conditions. However, a major cause of the loss of cassava is cassava leaf disease. In order to solve this problem, our team will develop several deep learning models to classify each cassava image into different categories of Cassava Leaf Diseases so they can be treated accordingly.

This first blog is being produced in order to document our team’s work on the Kaggle competition. Please feel free to look at codes at our team GitHub repository. We are very glad to receive all your comments and suggestions! And the blog post will be divided into the following parts:

  • Introduction to the Dataset
  • Exploratory Data Analysis (EDA)
  • Baseline Model Analysis
  • TensorBoard
  • Possible Next Steps

Introduction

The dataset has been acquired from Kaggle which contains 21,397 images in the training dataset and around 15,000 images in the test set. There are two variables in the dataset: Image_id is the image file name and label is the ID for categories of diseases. There are 5 classes of Cassava Leaf Diseases shown below:

Table 1: Cassava Leaf Diseases Labels

Exploratory Data Analysis

Missing Values

After checking the fraction of missing values in each variable and the fraction of points with missing values, it’s clear to see that there is no need to handle missing values.

Table 2: Missing Value Information

Imbalance

From the figure below, there exists an imbalance in this image dataset where “Cassava Mosaic Disease (CMD)” has around 13,158 instances which account for over 61.5% of the whole dataset. However, “Cassava Bacterial Blight (CBB)” has only around 1,087 instances which only account for 5.1% of the whole dataset.

Figure 1: Cassava Leaf Diseases Frequency Bar Plot

Images of Each Class

Next, let’s take a look at different classes of Cassava Leaf Diseases by randomly picking some images.

Figure 2: Cassava Leaf Diseases of Each Class

RGB Information

As we know, computers store images as a mosaic of tiny squares. Pixel, also known as a picture element, is used to extract information from an image. Each pixel is a combination of three colors: Red, Green, and Blue. In an RGB image, Red, Green, and Blue could have 256 different intensity or brightness values as their values are represented by an 8-bit number. From the table below, the RGB information of a random image is shown [1].

Figure 3: a Cassava Leaf Disease Image
Table 3: Basic Properties of an Image

As each pixel of the image is displayed by three distinct integers, the shape of the image is a three-layered matrix, where 0 indicates red channel, 1 indicates green channel and 2 indicates blue channels. From Figure 4 below, we can have a quick view of each channel in this image.

Figure 4: RGB Channels of an Image

In addition, this image is also split into separate color components: Red, Green, and Blue.

Figure 5: Split an Image intro Three Layers

Colored 3D Scatter Plot

From the RGB Information section, we’ve already known that RGB is the most common color in the color space. However, there are so many color spaces that can be used for specific goals. In this blog, RGB and HSV color spaces perform color segmentations as well as visualization of the color distribution for an image shown in Figure 6.

Figure 6: Cassava Leaf

RGB Color Space

In RGB color space, an image will be split with respect to the RBG channels. In this way, each axis in Figure 7 will represent one of the channels in this color space. From the figure below, it’s clear to see parts of green approximately cover the whole plot so that it could be hard to segment this leaf out in RGB space with respect to these RGB values [2].

Figure 7: an Image in RGB Color Space

HSV Color Space

Unlike RGB color space, HSV color space is a cylindrical color space. Images in this color space will be split with respect to Hue, Saturation, and Value (Brightness). The Hue channel is analyzed with respect to an angular dimension. The value channel is the vertical axis in this color space, where smaller values indicate darkness and otherwise. And the third axis is the saturation channel which indicates the shades of hue from least saturated at the vertical axis, to most saturated farthest away from the center [2].

From the figure below, It’s clear to see that the leaf’s greens are much more localized. In other words, it’s easier to separate colors visually. In addition, we can find that although the saturation and value of the greens change, they are mostly located within a specific range with respect to the hue axis. Therefore, it could be less hard for us to do color segmentation or extract some important information [2].

Figure 8: an Image in HSV Color Space

Image Augmentation

Image augmentation is an important technique in image classification projects. This technique enables us to perform various transformations on images in order to expand original datasets, save up on the overhead memory as well as make the model more robust. In this project, the “ImageDataGenerator” has been used to apply different random transformations including rotations, shifts, flips, brightness, zoom, and shear on original images. The reason we chose “ImageDataGenerator” is that it’s able to provide real-time data augmentations in future model training. In this section, we will share some interesting transformations and feel free to check all types of transformations at our GitHub repository.[3]

Random Shifts

Image shift is one augmentation method to change the positions of objects in images. One possible reason to use this is that sometimes the object is not shown properly in the center of an image. In “ImageDataGenerator”, the parameters “width_shift_range” and “height_shift_range” are used to adjust the fraction of total heights as well as total width by adding a certain constant value to all pixels. From the figure below, it’s able to see how shifts work in this image.

Figure 9: Random Shifts

Random Brightness

Image brightness is a great augmentation method to change the brightness of images. One possible reason to use this is that sometimes the object is not shown clearly in some extreme lighting conditions such as darkness. In “ImageDataGenerator”, the parameters “brightness_range” are used to randomly pick a brightness shift value. From the figure below, it’s able to see how brightness works in this image.

Figure 10: Random Brightness

Random Zoom

Image zoom is another great augmentation method to either zooms in or zooms out of images. In “ImageDataGenerator”, the parameter “zoom_range” is used to randomly perform zoom. From the figure below, it’s able to see how zoom works in this image.

Figure 11: Random Zoom

Put All Things Together

Let’s see how combinations of these image augmentation transformation work on this image!

Figure 12: Combined Transformation

Baseline Model Analysis

From EDA, it’s clear to see that this is an imbalanced dataset. If the “DummyClassifier” from the sklearn package was used to predict the baseline accuracy with respect to the most frequent class, it will achieve a baseline accuracy of 61.5%.

Choose a Simple CNN Model

In deep learning, Convolutional Neural Networks (CNN) are complex feed-forward neural networks. The CNN works by following a hierarchical model and then outputting a fully-connected layer. In real-word, CNNs are widely used for image classification and recognition due to their high accuracy [5].

As we know, a simple CNN model is efficient to produce adequately enough results. Therefore, CNNs are good choices to serve as a baseline model for this project. The information of this CNN model architecture is shown in Figure 13 [4]. In this simple model, there are 3 convolutional layers and 3 max-pooling layers without any non-trainable parameters. In this way, we want these 3 convolutional layers to extract important features in different dimensions, following with max-pooling after each convolutional layer. So this could keep the most relevant features in order to deliver to the next layer.

Figure 13: Baseline Model Architecture

Result

This simple CNN model was trained with 50 epochs (batch sizes = 32, train steps = 20, and validation steps = 20). For better monitoring of the training process, the TensorBoard was added in callbacks for later visualization. The last 10 training epochs are shown in Figure 14.

Figure 14: Last 10 Training Baseline Model Results

From the figure above, we can see baseline accuracy with the model is between 0.61 and 0.63. And the validation loss decreases as the epoch value increases. Later on, we will try different pre-trained models as well as combinations of tuning parameters to improve model performance.

TensorBoard

With TensorBoard, it’s easier to monitor the training process. In this baseline model, the accuracy and loss are monitored in the 50-epoch training process. For better visualization, feel free to check at tensorboard.dev.

Figure 15: Baseline Model Training Process (Accuracy)

From Figure 15, the training accuracy is not stable during this 50-epoch training process, where it bounces between 0.595 and 0.635. However, the validation accuracy is relatively stable. it begins at around 0.6 and gets improved after 50 epochs. Additionally, At some epochs, it has over 0.635 validation accuracy.

Figure 16: Baseline Model Training Process (Loss)

From Figure 16, both training and testing losses have a significant drop in the training process. It’s clear to see that both of them begin at around 1.20 and decrease to roughly 1.1.

Possible Next Steps…

Now we have done expository data analysis and a baseline model. The possible steps in the future are shown below:

  1. Deal with the imbalanced data:
    As we can see, our data is highly imbalanced. Therefore, considering a big dataset, we may try some method such as under-sampling algorithms.
  2. Use data augmentation:
    Besides, we will try to augment our images in real-time while building CNN models. We believe that this useful technique can generalize our model and could improve the overall model performance.
  3. Implement different models:
    We are also planning to apply different well-used image-processing models, such as InceptionV3, ResNet, AlexNex, etc. Moreover, we may also consider the transfer learning technique to build our own model based on these well-trained models.
  4. Hyperparameter tuning:
    As we know, hyperparameter tuning plays an important role in model performance and optimization. We will definitely think about different combinations of hyperparameters such as learning rates, the number of layers in the network, size of the hidden layers. If there is an overfitting problem, we may also implement “earlystop” or modify more dropouts rate.

Reference

  1. Basic image data analysis using numpy and Opencv — Part 1. (n.d.). Retrieved February 21, 2021, from https://www.kdnuggets.com/2018/07/basic-image-data-analysis-numpy-opencv-p1.html
  2. Real Python. (2020, November 07). Image segmentation using color spaces in opencv + python. Retrieved February 21, 2021, from https://realpython.com/python-opencv-color-spaces/
  3. Chollet, F. (n.d.). The Keras BLOG. Retrieved February 21, 2021, from https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
  4. Lys620. (2021, February 13). Cassava-leaf-disease_simplecnn. Retrieved February 21, 2021, from https://www.kaggle.com/lys620/cassava-leaf-disease-simplecnn
  5. Maladkar, Kishan. “Overview of Convolutional Neural Network in Image Classification.” Analytics India Magazine, 21 Nov. 2020, analyticsindiamag.com/convolutional-neural-network-image-classification-overview/#:~:text=CNNs%20are%20used%20for%20image,because%20of%20its%20high%20accuracy.&text=The%20CNN%20follows%20a%20hierarchical,and%20the%20output%20is%20processed.

--

--

lovable lazy
lovable lazy

Written by lovable lazy

We are lovable and lazy guys from the Data Science Initiative at Brown University.