Loading

Data Purchasing Challenge 2022

Sneak Peek into the image samples from Round 2 dataset.

This notebook will help you to visualise images from different classes and combinations of them.

sagar_rathod

Quickly take a look at the image samples of different class labels.

This notebook will help you to understand images from different classes. Specifically, images of 'stray_partical' and 'discoloration' new classes introduced in 2nd round of this challenge.

We make use of deepml python library to quickly visualize these images.

In [ ]:
!pip install deepml
In [1]:
import pandas as pd
import deepml
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

#mpl.rcParams['text.color'] = 'white'
In [2]:
train_df = pd.read_csv("data-purchasing-challenge-2022-starter-kit/data/training/labels.csv")
train_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   filename        1000 non-null   object
 1   scratch_small   1000 non-null   int64 
 2   scratch_large   1000 non-null   int64 
 3   dent_small      1000 non-null   int64 
 4   dent_large      1000 non-null   int64 
 5   stray_particle  1000 non-null   int64 
 6   discoloration   1000 non-null   int64 
dtypes: int64(6), object(1)
memory usage: 54.8+ KB
In [3]:
train_df.head()
Out[3]:
filename scratch_small scratch_large dent_small dent_large stray_particle discoloration
0 np7x98vV9L.png 0 0 1 0 0 0
1 eJL9eBxtwi.png 1 0 0 0 0 0
2 Mm0wzMknhT.png 0 0 0 0 0 0
3 UJhpQVf8LP.png 0 0 0 0 0 0
4 5vpsw4NX6n.png 0 0 0 0 0 0

Create additional class called 'no_defect' for image samples containig no damages.

In [4]:
train_df['no_defect'] = (~train_df.iloc[:, 1:].any(axis=1)).astype(int)
In [5]:
classes = train_df.columns[1:].tolist()
classes
Out[5]:
['scratch_small',
 'scratch_large',
 'dent_small',
 'dent_large',
 'stray_particle',
 'discoloration',
 'no_defect']

Since it's a multiclass classification challenge, let's create Joined Class Label Distribution.

In [6]:
train_df['joined_label'] = train_df[classes].apply(lambda row: " ".join([c for c in classes if row[c]]),
                                                                                     axis=1)
train_df.head()
Out[6]:
filename scratch_small scratch_large dent_small dent_large stray_particle discoloration no_defect joined_label
0 np7x98vV9L.png 0 0 1 0 0 0 0 dent_small
1 eJL9eBxtwi.png 1 0 0 0 0 0 0 scratch_small
2 Mm0wzMknhT.png 0 0 0 0 0 0 1 no_defect
3 UJhpQVf8LP.png 0 0 0 0 0 0 1 no_defect
4 5vpsw4NX6n.png 0 0 0 0 0 0 1 no_defect
In [7]:
train_df['joined_label'].value_counts()
Out[7]:
stray_particle                                                                    524
no_defect                                                                         202
scratch_small dent_small stray_particle discoloration                              34
dent_small                                                                         30
scratch_small dent_large stray_particle discoloration                              27
dent_large stray_particle                                                          23
scratch_small scratch_large dent_small stray_particle                              23
scratch_small dent_small stray_particle                                            21
scratch_small dent_small                                                           19
scratch_small scratch_large                                                        16
scratch_small                                                                      14
scratch_small scratch_large dent_small dent_large stray_particle discoloration     12
scratch_large                                                                       6
dent_large                                                                          6
dent_small discoloration                                                            5
dent_small stray_particle                                                           5
scratch_small scratch_large dent_large stray_particle discoloration                 5
scratch_large dent_small                                                            4
dent_large stray_particle discoloration                                             4
scratch_small scratch_large dent_large                                              3
scratch_small dent_large                                                            3
scratch_large dent_small discoloration                                              2
scratch_small scratch_large dent_small stray_particle discoloration                 2
scratch_small scratch_large dent_small dent_large discoloration                     2
scratch_small stray_particle                                                        2
scratch_small scratch_large dent_small dent_large stray_particle                    2
scratch_small scratch_large dent_small discoloration                                1
dent_small dent_large discoloration                                                 1
discoloration                                                                       1
scratch_small dent_small dent_large stray_particle                                  1
Name: joined_label, dtype: int64
In [8]:
plt.figure(figsize=(10,15))
sns.countplot(y='joined_label', data=train_df)
Out[8]:
<AxesSubplot:xlabel='count', ylabel='joined_label'>
In [9]:
from deepml.visualize import show_images_from_dataframe
/Users/rathods/opt/anaconda3/envs/machine_learning/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Random samples from training csv file

In [10]:
train_image_dir = "data-purchasing-challenge-2022-starter-kit/data/training/images"
show_images_from_dataframe(train_df, img_dir = train_image_dir, image_file_name_column='filename', 
                           label_column='joined_label', samples=10, cols=2, figsize=(10, 30))