Loading

Seismic Facies Identification Challenge

[Explainer] W&B, Catalyst with EDA, Extra Attributes

EDA, Extra Features with wandb API and Catalyst training for rapid experimentation

jyot_makadiya

This notebook aims to introduce two tools for rapid experimentation and better configuration for your training, I propose catalyst and wandb in this competition which gives a decent score with the same flexibility of custom training loop. I also tried to summarize important EDA and extra attributes but only as a brief section as I feel that most of the existing works have already done a beautiful job of explaining the data exploration part and thus I introduced new tools with my approach for this challenge.

Seismic Facies Identification Challenge - Explainer Notebook with Added Perks

- Jyot Makadiya

In this notebook, we are going to look at Seismic Facies Identification challenge which is a 3D semantic segmentation problem and with some tweaking, can be converted into 2D semantic segmentation task. This notebook aims to give an idea about previous work (notebooks at this page) and summarizes a few very basic EDA, extra attibutes ideas and finally introduces a flexible and systemic approach for training models and tracking progress.

  • What is 2D semantic segmentation? </br> In simple words, semantic segmentation is a image recognition challenge with some perks where each pixel a class needs to be identified and tagged instead of whole image, so what we get in the end is a mask the size of original image segmented into different classes. As we all know, it requires a lot of data, different tools are used to prepared the dataset for segmentation maps from images.
  • what is w&B ? weights and biases is a free service and set of tools which provide easy tracking of all data science experiments along with the data visualization tools, in this notebook, we present very simple level introduction to wandb for only loss,accuracy vis but it can be further extended to store hyperparameter info, saving models etc.
  • What is smp and why we are using it ? Segmentation models pytorch is package which provides pretrained well known segmentation architecture for suitable segmentation tasks. It is easy to use and easy to debug which makes it popular choice among researcher to initiate rapid experimentation with that.
  • what is the need of preprocessing (in our case why augmentations and normalization)? I have tried to make this notebook well balanced between not too complex for preprocessing as well as enough to get a good score, basically, the preprocessing step is crucial in some tasks where there is a need for some performance improvement using some domain insights. The aim of augmentation is to get more data for training while reducing the overfitting effect. We applied normalization to make the train values even and not concentrated around same mean values.

Get dataset from colab

In [1]:
#getting the dataset from the drive
from google.colab import drive
drive.mount('/content/gdrive/')
Mounted at /content/gdrive/
In [2]:
!pip install segmentation_models_pytorch argus pytorch_toolbelt wandb catalyst --upgrade albumentations
Collecting segmentation_models_pytorch
  Downloading segmentation_models_pytorch-0.2.0-py3-none-any.whl (87 kB)
     |████████████████████████████████| 87 kB 3.3 MB/s 
Collecting argus
  Downloading argus-0.0.11-py2.py3-none-any.whl (69 kB)
     |████████████████████████████████| 69 kB 6.4 MB/s 
Collecting pytorch_toolbelt
  Downloading pytorch_toolbelt-0.4.4-py3-none-any.whl (159 kB)
     |████████████████████████████████| 159 kB 17.3 MB/s 
Collecting wandb
  Downloading wandb-0.12.6-py2.py3-none-any.whl (1.7 MB)
     |████████████████████████████████| 1.7 MB 29.9 MB/s 
Collecting catalyst
  Downloading catalyst-21.10-py2.py3-none-any.whl (576 kB)
     |████████████████████████████████| 576 kB 24.0 MB/s 
Requirement already satisfied: albumentations in /usr/local/lib/python3.7/dist-packages (0.1.12)
Collecting albumentations
  Downloading albumentations-1.1.0-py3-none-any.whl (102 kB)
     |████████████████████████████████| 102 kB 51.4 MB/s 
Requirement already satisfied: torchvision>=0.5.0 in /usr/local/lib/python3.7/dist-packages (from segmentation_models_pytorch) (0.10.0+cu111)
Collecting efficientnet-pytorch==0.6.3
  Downloading efficientnet_pytorch-0.6.3.tar.gz (16 kB)
Collecting pretrainedmodels==0.7.4
  Downloading pretrainedmodels-0.7.4.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 6.1 MB/s 
Collecting timm==0.4.12
  Downloading timm-0.4.12-py3-none-any.whl (376 kB)
     |████████████████████████████████| 376 kB 45.3 MB/s 
Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from efficientnet-pytorch==0.6.3->segmentation_models_pytorch) (1.9.0+cu111)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from pretrainedmodels==0.7.4->segmentation_models_pytorch) (4.62.3)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->efficientnet-pytorch==0.6.3->segmentation_models_pytorch) (3.7.4.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.5.0->segmentation_models_pytorch) (1.19.5)
Requirement already satisfied: pillow>=5.3.0 in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.5.0->segmentation_models_pytorch) (7.1.2)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from argus) (1.4.1)
Requirement already satisfied: pandas>=0.20.2 in /usr/local/lib/python3.7/dist-packages (from argus) (1.1.5)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.20.2->argus) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.20.2->argus) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.20.2->argus) (1.15.0)
Requirement already satisfied: opencv-python>=4.1 in /usr/local/lib/python3.7/dist-packages (from pytorch_toolbelt) (4.1.2.30)
Requirement already satisfied: requests<3,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb) (2.23.0)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.4.3-py2.py3-none-any.whl (139 kB)
     |████████████████████████████████| 139 kB 51.0 MB/s 
Requirement already satisfied: promise<3,>=2.0 in /usr/local/lib/python3.7/dist-packages (from wandb) (2.3)
Collecting shortuuid>=0.5.0
  Downloading shortuuid-1.0.7-py3-none-any.whl (8.6 kB)
Collecting subprocess32>=3.5.3
  Downloading subprocess32-3.5.4.tar.gz (97 kB)
     |████████████████████████████████| 97 kB 7.1 MB/s 
Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.7/dist-packages (from wandb) (5.4.8)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from wandb) (3.13)
Collecting GitPython>=1.0.0
  Downloading GitPython-3.1.24-py3-none-any.whl (180 kB)
     |████████████████████████████████| 180 kB 49.1 MB/s 
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting configparser>=3.8.1
  Downloading configparser-5.1.0-py3-none-any.whl (19 kB)
Collecting yaspin>=1.0.0
  Downloading yaspin-2.1.0-py3-none-any.whl (18 kB)
Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from wandb) (3.17.3)
Requirement already satisfied: Click!=8.0.0,>=7.0 in /usr/local/lib/python3.7/dist-packages (from wandb) (7.1.2)
Collecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 1.7 MB/s 
Collecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.0.0->wandb) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.0.0->wandb) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.0.0->wandb) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.0.0->wandb) (2021.5.30)
Requirement already satisfied: termcolor<2.0.0,>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from yaspin>=1.0.0->wandb) (1.1.0)
Collecting tensorboardX<2.3.0>=2.1.0
  Downloading tensorboardX-2.2-py2.py3-none-any.whl (120 kB)
     |████████████████████████████████| 120 kB 48.4 MB/s 
Collecting PyYAML
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
     |████████████████████████████████| 596 kB 48.0 MB/s 
Collecting hydra-slayer>=0.1.1
  Downloading hydra_slayer-0.3.0-py3-none-any.whl (12 kB)
Requirement already satisfied: scikit-image>=0.16.1 in /usr/local/lib/python3.7/dist-packages (from albumentations) (0.16.2)
Collecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.5.4.58-cp37-cp37m-manylinux2014_x86_64.whl (47.6 MB)
     |████████████████████████████████| 47.6 MB 36 kB/s 
Collecting qudida>=0.0.4
  Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Requirement already satisfied: scikit-learn>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from qudida>=0.0.4->albumentations) (0.22.2.post1)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1->albumentations) (2.6.3)
Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1->albumentations) (3.2.2)
Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1->albumentations) (2.4.1)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1->albumentations) (1.1.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations) (2.4.7)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations) (1.0.1)
Building wheels for collected packages: efficientnet-pytorch, pretrainedmodels, subprocess32, pathtools
  Building wheel for efficientnet-pytorch (setup.py) ... done
  Created wheel for efficientnet-pytorch: filename=efficientnet_pytorch-0.6.3-py3-none-any.whl size=12421 sha256=0ac75d235a75f1bc5b4c35c493d7434010a9d19805abe63199cb74d0c158f1c1
  Stored in directory: /root/.cache/pip/wheels/90/6b/0c/f0ad36d00310e65390b0d4c9218ae6250ac579c92540c9097a
  Building wheel for pretrainedmodels (setup.py) ... done
  Created wheel for pretrainedmodels: filename=pretrainedmodels-0.7.4-py3-none-any.whl size=60965 sha256=3052e7733fc7b1e47ae0d87ac96e884e90b03b01ca7c116174f875377556fc01
  Stored in directory: /root/.cache/pip/wheels/ed/27/e8/9543d42de2740d3544db96aefef63bda3f2c1761b3334f4873
  Building wheel for subprocess32 (setup.py) ... done
  Created wheel for subprocess32: filename=subprocess32-3.5.4-py3-none-any.whl size=6502 sha256=582be0733a7c07a1a528ec91c965dab8e9527a2d28a9186f988ecac7729a8fe6
  Stored in directory: /root/.cache/pip/wheels/50/ca/fa/8fca8d246e64f19488d07567547ddec8eb084e8c0d7a59226a
  Building wheel for pathtools (setup.py) ... done
  Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8807 sha256=c26f2c780d4164d14588ef628c77712b2adc1c0d0219fdd4289475fa372d33fb
  Stored in directory: /root/.cache/pip/wheels/3e/31/09/fa59cef12cdcfecc627b3d24273699f390e71828921b2cbba2
Successfully built efficientnet-pytorch pretrainedmodels subprocess32 pathtools
Installing collected packages: smmap, opencv-python-headless, munch, gitdb, yaspin, timm, tensorboardX, subprocess32, shortuuid, sentry-sdk, qudida, PyYAML, pretrainedmodels, pathtools, hydra-slayer, GitPython, efficientnet-pytorch, docker-pycreds, configparser, wandb, segmentation-models-pytorch, pytorch-toolbelt, catalyst, argus, albumentations
  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
  Attempting uninstall: albumentations
    Found existing installation: albumentations 0.1.12
    Uninstalling albumentations-0.1.12:
      Successfully uninstalled albumentations-0.1.12
Successfully installed GitPython-3.1.24 PyYAML-6.0 albumentations-1.1.0 argus-0.0.11 catalyst-21.10 configparser-5.1.0 docker-pycreds-0.4.0 efficientnet-pytorch-0.6.3 gitdb-4.0.9 hydra-slayer-0.3.0 munch-2.5.0 opencv-python-headless-4.5.4.58 pathtools-0.1.2 pretrainedmodels-0.7.4 pytorch-toolbelt-0.4.4 qudida-0.0.4 segmentation-models-pytorch-0.2.0 sentry-sdk-1.4.3 shortuuid-1.0.7 smmap-5.0.0 subprocess32-3.5.4 tensorboardX-2.2 timm-0.4.12 wandb-0.12.6 yaspin-2.1.0
In [3]:
!pip install git+https://github.com/gazprom-neft/seismiqb.git
Collecting git+https://github.com/gazprom-neft/seismiqb.git
  Cloning https://github.com/gazprom-neft/seismiqb.git to /tmp/pip-req-build-mjpw3gof
  Running command git clone -q https://github.com/gazprom-neft/seismiqb.git /tmp/pip-req-build-mjpw3gof
  Running command git submodule update --init --recursive -q
Requirement already satisfied: numpy>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (1.19.5)
Requirement already satisfied: scipy>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (1.4.1)
Requirement already satisfied: pandas>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (1.1.5)
Requirement already satisfied: numexpr>=2.7 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (2.7.3)
Requirement already satisfied: bottleneck>=1.3 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (1.3.2)
Requirement already satisfied: matplotlib>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (3.2.2)
Requirement already satisfied: tqdm>=4.50.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (4.62.3)
Collecting segyio>=1.8.3
  Downloading segyio-1.9.7-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (83 kB)
     |████████████████████████████████| 83 kB 1.6 MB/s 
Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.22.2.post1)
Requirement already satisfied: numba>=0.43.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.51.2)
Requirement already satisfied: scikit_image>=0.16.2 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.16.2)
Requirement already satisfied: nbconvert>=5.6.1 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (5.6.1)
Requirement already satisfied: plotly>=4.3.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (4.4.1)
Requirement already satisfied: feather_format>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.4.1)
Requirement already satisfied: dask[dataframe]>=2.8.1 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (2.12.0)
Collecting fpdf>=1.7.2
  Downloading fpdf-1.7.2.tar.gz (39 kB)
Requirement already satisfied: opencv_python>=4.1.2.30 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (4.1.2.30)
Requirement already satisfied: h5py>=2.10.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (3.1.0)
Collecting h5pickle>=0.2.0
  Downloading h5pickle-0.4.2-py3-none-any.whl (4.0 kB)
Collecting nvidia_smi>=0.1.3
  Downloading nvidia_smi-0.1.3-py36-none-any.whl (11 kB)
Requirement already satisfied: nvidia-ml-py3>=7.3 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (7.352.0)
Collecting ipython>=7.10.0
  Downloading ipython-7.29.0-py3-none-any.whl (790 kB)
     |████████████████████████████████| 790 kB 17.6 MB/s 
Collecting Pillow>=8.0.1
  Downloading Pillow-8.4.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
     |████████████████████████████████| 3.1 MB 26.7 MB/s 
Collecting psutil>=5.6.7
  Downloading psutil-5.8.0-cp37-cp37m-manylinux2010_x86_64.whl (296 kB)
     |████████████████████████████████| 296 kB 41.0 MB/s 
Requirement already satisfied: seaborn>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.11.2)
Requirement already satisfied: dill>=0.3.1.1 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (0.3.4)
Requirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (2.23.0)
Collecting blosc>=1.8.1
  Downloading blosc-1.10.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.6 MB)
     |████████████████████████████████| 2.6 MB 33.7 MB/s 
Collecting pytest>=5.3.1
  Downloading pytest-6.2.5-py3-none-any.whl (280 kB)
     |████████████████████████████████| 280 kB 44.6 MB/s 
Requirement already satisfied: torch>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (1.9.0+cu111)
Requirement already satisfied: ipywidgets>=7.0 in /usr/local/lib/python3.7/dist-packages (from seismiQB==0.1.0) (7.6.5)
Collecting partd>=0.3.10
  Downloading partd-1.2.0-py3-none-any.whl (19 kB)
Requirement already satisfied: toolz>=0.7.3 in /usr/local/lib/python3.7/dist-packages (from dask[dataframe]>=2.8.1->seismiQB==0.1.0) (0.11.1)
Collecting fsspec>=0.6.0
  Downloading fsspec-2021.11.0-py3-none-any.whl (132 kB)
     |████████████████████████████████| 132 kB 43.6 MB/s 
Requirement already satisfied: pyarrow>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from feather_format>=0.4.0->seismiQB==0.1.0) (3.0.0)
Requirement already satisfied: cachetools in /usr/local/lib/python3.7/dist-packages (from h5pickle>=0.2.0->seismiQB==0.1.0) (4.2.4)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.10.0->seismiQB==0.1.0) (1.5.2)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (5.1.0)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (0.7.5)
Requirement already satisfied: backcall in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (0.2.0)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (57.4.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (4.4.2)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (4.8.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (0.1.3)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (0.18.0)
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
  Downloading prompt_toolkit-3.0.22-py3-none-any.whl (374 kB)
     |████████████████████████████████| 374 kB 33.2 MB/s 
Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from ipython>=7.10.0->seismiQB==0.1.0) (2.6.1)
Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0->seismiQB==0.1.0) (0.2.0)
Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0->seismiQB==0.1.0) (5.1.3)
Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0->seismiQB==0.1.0) (4.10.1)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0->seismiQB==0.1.0) (3.5.1)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets>=7.0->seismiQB==0.1.0) (1.0.2)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.7/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.0->seismiQB==0.1.0) (5.3.5)
Requirement already satisfied: tornado>=4.0 in /usr/local/lib/python3.7/dist-packages (from ipykernel>=4.5.1->ipywidgets>=7.0->seismiQB==0.1.0) (5.1.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from jedi>=0.16->ipython>=7.10.0->seismiQB==0.1.0) (0.8.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.2->seismiQB==0.1.0) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.2->seismiQB==0.1.0) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.2->seismiQB==0.1.0) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.2->seismiQB==0.1.0) (1.3.2)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib>=3.0.2->seismiQB==0.1.0) (1.15.0)
Requirement already satisfied: bleach in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (4.1.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (0.8.4)
Requirement already satisfied: testpath in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (0.5.0)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (0.3)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (4.8.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (1.5.0)
Requirement already satisfied: jinja2>=2.4 in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (2.11.3)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.7/dist-packages (from nbconvert>=5.6.1->seismiQB==0.1.0) (0.7.1)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2>=2.4->nbconvert>=5.6.1->seismiQB==0.1.0) (2.0.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.7/dist-packages (from nbformat>=4.2.0->ipywidgets>=7.0->seismiQB==0.1.0) (2.6.0)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.43.0->seismiQB==0.1.0) (0.34.0)
Collecting sorcery>=0.1.0
  Downloading sorcery-0.2.1.tar.gz (14 kB)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.0.0->seismiQB==0.1.0) (2018.9)
Collecting locket
  Downloading locket-0.2.1-py2.py3-none-any.whl (4.1 kB)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.7/dist-packages (from pexpect>4.3->ipython>=7.10.0->seismiQB==0.1.0) (0.7.0)
Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.7/dist-packages (from plotly>=4.3.0->seismiQB==0.1.0) (1.3.3)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=7.10.0->seismiQB==0.1.0) (0.2.5)
Requirement already satisfied: importlib-metadata>=0.12 in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (4.8.1)
Collecting pluggy<2.0,>=0.12
  Downloading pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (21.0)
Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (0.10.2)
Requirement already satisfied: py>=1.8.2 in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (1.10.0)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (21.2.0)
Requirement already satisfied: iniconfig in /usr/local/lib/python3.7/dist-packages (from pytest>=5.3.1->seismiQB==0.1.0) (1.1.1)
Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.12->pytest>=5.3.1->seismiQB==0.1.0) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.12->pytest>=5.3.1->seismiQB==0.1.0) (3.6.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->seismiQB==0.1.0) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->seismiQB==0.1.0) (2021.5.30)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->seismiQB==0.1.0) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->seismiQB==0.1.0) (3.0.4)
Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit_image>=0.16.2->seismiQB==0.1.0) (2.4.1)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit_image>=0.16.2->seismiQB==0.1.0) (2.6.3)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit_image>=0.16.2->seismiQB==0.1.0) (1.1.1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seismiQB==0.1.0) (1.0.1)
Collecting executing
  Downloading executing-0.8.2-py2.py3-none-any.whl (16 kB)
Collecting littleutils>=0.2.1
  Downloading littleutils-0.2.2.tar.gz (6.6 kB)
Collecting asttokens
  Downloading asttokens-2.0.5-py2.py3-none-any.whl (20 kB)
Requirement already satisfied: wrapt in /usr/local/lib/python3.7/dist-packages (from sorcery>=0.1.0->nvidia_smi>=0.1.3->seismiQB==0.1.0) (1.12.1)
Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.7/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0->seismiQB==0.1.0) (5.3.1)
Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0->seismiQB==0.1.0) (0.12.1)
Requirement already satisfied: Send2Trash in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0->seismiQB==0.1.0) (1.8.0)
Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.7/dist-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.0->seismiQB==0.1.0) (22.3.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.7/dist-packages (from bleach->nbconvert>=5.6.1->seismiQB==0.1.0) (0.5.1)
Building wheels for collected packages: seismiQB, fpdf, sorcery, littleutils
  Building wheel for seismiQB (setup.py) ... done
  Created wheel for seismiQB: filename=seismiQB-0.1.0-py3-none-any.whl size=602264 sha256=f12858e1e360382e497aa81d56155f9dfcd61bc3298dbc3aeca60ee9801fc136
  Stored in directory: /tmp/pip-ephem-wheel-cache-025omhjl/wheels/b3/90/9f/a8018f3983b2de90b64c2514a318cf521c2877c602d2dbff43
  Building wheel for fpdf (setup.py) ... done
  Created wheel for fpdf: filename=fpdf-1.7.2-py2.py3-none-any.whl size=40722 sha256=f279b745f970a85ed74a2fa3cf28583bcaea97a422a8aa59607166ebaf7920d0
  Stored in directory: /root/.cache/pip/wheels/d7/ca/c8/86467e7957bbbcbdf4cf4870fc7dc95e9a16404b2e3c3a98c3
  Building wheel for sorcery (setup.py) ... done
  Created wheel for sorcery: filename=sorcery-0.2.1-py3-none-any.whl size=10765 sha256=26c4ade2c6aafdab06ad8156cc210470491f3d5c6f47e9f2aeb5d55a1f0b4192
  Stored in directory: /root/.cache/pip/wheels/cb/86/e4/6588cc95965cfc294f8f1797cfdb117c025752819b47193543
  Building wheel for littleutils (setup.py) ... done
  Created wheel for littleutils: filename=littleutils-0.2.2-py3-none-any.whl size=7048 sha256=272653872740bcd71b21b89649c8c2228f58a91033c0d6156bacb5809e3856a5
  Stored in directory: /root/.cache/pip/wheels/d6/64/cd/32819b511a488e4993f2fab909a95330289c3f4e0f6ef4676d
Successfully built seismiQB fpdf sorcery littleutils
Installing collected packages: prompt-toolkit, ipython, pluggy, Pillow, locket, littleutils, executing, asttokens, sorcery, pytest, partd, fsspec, segyio, psutil, nvidia-smi, h5pickle, fpdf, blosc, seismiQB
  Attempting uninstall: prompt-toolkit
    Found existing installation: prompt-toolkit 1.0.18
    Uninstalling prompt-toolkit-1.0.18:
      Successfully uninstalled prompt-toolkit-1.0.18
  Attempting uninstall: ipython
    Found existing installation: ipython 5.5.0
    Uninstalling ipython-5.5.0:
      Successfully uninstalled ipython-5.5.0
  Attempting uninstall: pluggy
    Found existing installation: pluggy 0.7.1
    Uninstalling pluggy-0.7.1:
      Successfully uninstalled pluggy-0.7.1
  Attempting uninstall: Pillow
    Found existing installation: Pillow 7.1.2
    Uninstalling Pillow-7.1.2:
      Successfully uninstalled Pillow-7.1.2
  Attempting uninstall: pytest
    Found existing installation: pytest 3.6.4
    Uninstalling pytest-3.6.4:
      Successfully uninstalled pytest-3.6.4
  Attempting uninstall: psutil
    Found existing installation: psutil 5.4.8
    Uninstalling psutil-5.4.8:
      Successfully uninstalled psutil-5.4.8
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-console 5.2.0 requires prompt-toolkit<2.0.0,>=1.0.0, but you have prompt-toolkit 3.0.22 which is incompatible.
google-colab 1.0.0 requires ipython~=5.5.0, but you have ipython 7.29.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed Pillow-8.4.0 asttokens-2.0.5 blosc-1.10.6 executing-0.8.2 fpdf-1.7.2 fsspec-2021.11.0 h5pickle-0.4.2 ipython-7.29.0 littleutils-0.2.2 locket-0.2.1 nvidia-smi-0.1.3 partd-1.2.0 pluggy-1.0.0 prompt-toolkit-3.0.22 psutil-5.8.0 pytest-6.2.5 segyio-1.9.7 seismiQB-0.1.0 sorcery-0.2.1

Note: I already have dataset downloaded in my drive, so please make sure to put competition dataset over there and get the path for the dataset.

In [4]:
#copy the dataset
!mkdir -p seismic-facies-identification-challenge
!rsync -avhW --compress-level=2 --info=progress2 /content/gdrive/MyDrive/Datasets/AIcrowd/seismic-facies-identification/data /content/seismic-facies-identification-challenge/data
sending incremental file list
created directory /content/seismic-facies-identification-challenge/data
data/
data/data_test_1.npz
        731.38M  13%   20.27MB/s    0:00:34 (xfr#1, to-chk=0/8)
data/data_test_2.npz
          1.78G  32%   20.16MB/s    0:01:24 (xfr#2, to-chk=5/8)
data/data_train.npz
          3.49G  63%   19.85MB/s    0:02:47 (xfr#3, to-chk=4/8)
data/data_train_processed.npz
          5.22G  95%   19.89MB/s    0:04:10 (xfr#4, to-chk=3/8)
data/labels_train.npz
          5.23G  95%   19.83MB/s    0:04:11 (xfr#5, to-chk=2/8)
data/sample_submission_1.npz
          5.33G  97%   19.59MB/s    0:04:19 (xfr#6, to-chk=1/8)
data/sample_submission_2.npz
          5.48G 100%   19.42MB/s    0:04:29 (xfr#7, to-chk=0/8)

sent 5.47G bytes  received 229 bytes  20.14M bytes/sec
total size is 5.48G  speedup is 1.00

Importing packages/libraries

Import packages that are common for training and prediction phases here.

In [5]:
#importing libraries, fundamentals and EDA related
import numpy as np
from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt
import torch
torch.backends.cudnn.benchmark = True
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, Subset

import segmentation_models_pytorch as smp
# import argus
# from argus.callbacks import MonitorCheckpoint, EarlyStopping, LoggingToFile, ReduceLROnPlateau
# import albumentations as A
# from pytorch_toolbelt.inference.tiles import ImageSlicer, CudaTileMerger
# from pytorch_toolbelt.losses import LovaszLoss
In [6]:
#catalyst related imports
import os
from tempfile import TemporaryDirectory

from pytest import mark
from torch import nn, optim
from torch.utils.data import DataLoader

import catalyst
from catalyst import dl, utils, metrics
from catalyst.contrib.datasets import MNIST
from catalyst.data.transforms import ToTensor
from catalyst.settings import IS_CUDA_AVAILABLE, NUM_CUDA_DEVICES, SETTINGS
In [7]:
#misc imports
import albumentations as A

from copy import copy

import numpy as np
import matplotlib.pyplot as plt
import gc
import tqdm
import cv2

#setting the seed for catalyst
SEED = 42
utils.set_global_seed(SEED)
utils.prepare_cudnn(deterministic=True)

Loading train and test dataset

we use numpy to get images/3D maps from compressed files

In [8]:
%%time
#loading training and test data
train_data_full = np.load('/content/seismic-facies-identification-challenge/data/data/data_train.npz', allow_pickle=True, mmap_mode='r')['data']
train_label_full = np.load('/content/seismic-facies-identification-challenge/data/data/labels_train.npz', allow_pickle=True, mmap_mode='r')['labels']

test_img = np.load('/content/seismic-facies-identification-challenge/data/data/data_test_2.npz', allow_pickle=True, mmap_mode='r')['data']
CPU times: user 23.5 s, sys: 4.36 s, total: 27.8 s
Wall time: 1min 9s

EDA

Now let's take a look at the training data starting with basic info about data then plotting train data distribution w.r.t. mean +- 3 std and its log trasformation as well.

In [9]:
%%time
mean, std = train_data_full.mean(), train_data_full.std()

ranges = train_data_full.min(), train_data_full.max()
values = np.unique(train_label_full)

data_histograms = [train_data_full[train_label_full == value].flatten()
                   for value in values]
CPU times: user 41.8 s, sys: 2.31 s, total: 44.1 s
Wall time: 43.9 s
In [10]:
plt.figure(figsize=(8, 5))
plt.hist(train_data_full.flatten(), bins=100, color = 'b', alpha=0.2)

plt.axvline(mean, color='r', linestyle='dashed', linewidth=2)
plt.axvline(mean + 3*std, color='r', linestyle='dashed', linewidth=1)
plt.axvline(mean - 3*std, color='r', linestyle='dashed', linewidth=1)
plt.show()

plt.figure(figsize=(8, 5))
plt.hist(train_data_full.flatten(), bins=100, log=True, color = 'b', alpha=0.2)

plt.axvline(mean, color='r', linestyle='dashed', linewidth=2)
plt.axvline(mean + 3*std, color='r', linestyle='dashed', linewidth=1)
plt.axvline(mean - 3*std, color='r', linestyle='dashed', linewidth=1)
plt.show()

Please notice the log distribution is much more evenly spread than original values, from which we can conclude the image/train data contains non-uniform a few high frequency values probably concentrated around mean (definetly is from seeing the graph). </br> Also the values are not skewed around mean, rather evenly splitted.

Now let's take a look at how the label distribution looks like.

In [ ]:
CLASS_LABELS = [
    'Basement/other',
    'Slope Mudstone A',
    'Mass Transport\n Deposit',
    'Slope Mudstone B',
    'Slope Valley',
    'Submarine Canyon\n System'
]

fig, ax = plt.subplots(3, 2, figsize=(14, 10))

for i, value in enumerate(values):
    data = data_histograms[i]
    mean_, std_ = data.mean(), data.std()
    ax_ = ax[i // 2, i % 2]
    
    label_name = CLASS_LABELS[value-1].replace('\n', '')
    
    ax_.hist(data, log=True, bins=50, range=ranges, color='b',
                     label=f'value {value}: {label_name}', alpha=0.2)
    ax_.axvline(mean_, color='r', linestyle='dashed', linewidth=2)
    ax_.axvline(mean_ + 3*std_, color='r', linestyle='dashed', linewidth=1)
    ax_.axvline(mean_ - 3*std_, color='r', linestyle='dashed', linewidth=1)
    
    ax_.legend(loc='best')
    
    print(f'{value}) mean is {mean_:4.4} and std is {std_:4.4} for label: {label_name}')

fig.show()
1 ::: mean is -0.8623 ::: std is 213.2 ::: Basement/other
2 ::: mean is 0.9905 ::: std is 379.5 ::: Slope Mudstone A
3 ::: mean is -5.073 ::: std is 468.9 ::: Mass Transport Deposit
4 ::: mean is 2.11 ::: std is 453.0 ::: Slope Mudstone B
5 ::: mean is -18.73 ::: std is 655.1 ::: Slope Valley
6 ::: mean is 1.998 ::: std is 372.3 ::: Submarine Canyon System

Above we looked at individual labels distribution along with its mean,std ranges. The 6 values look more or less the same with a few changes. Now, we are going to visualize how a 2D slice of 3D data can be represented and we try to plot that using below code. The code is adapted from a great EDA notebook by dipam_chakraborty, make sure to check out their work here which provides great insights as well as fun to read.

In [11]:
#let's look at 2d section of 3d data using below slice visualization 
fig, ax = plt.subplots(1,3, sharey=True);
fig.set_size_inches(20, 8);
fig.suptitle("2D slice of the 3D seismic data volume for better visualization", fontsize=20);

ax[0].imshow(train_data_full[:, :, 100], cmap='terrain');
ax[0].set_ylabel('Z Axis: Top - Bottom', fontsize=14);
ax[1].imshow(train_label_full[:, :, 100]);
ax[2].imshow(train_data_full[:, :, 100], cmap='terrain');
ax[2].imshow(train_label_full[:, :, 100], alpha=0.4, cmap='twilight');

for i in range(3):
    ax[i].set_xlabel('X Axis: West - East', fontsize=14);

EDA - Extra attributes

Moving on with attributes, we will explore a few operations which can be used to increase number of channels in the input. Although, these addition may lead to improved performance, DL models (NNs) can approximate them and the performace advatange may not be significant (apart from some speedup during training) </br> Please make sure to check out in-depth EDA from this great notebook by leocd here and wonderfull work of sergeytsimfer can be found at this discussion thread, I have used adapted version of a few attributes used by them.

Here, we are using hilbert operator to create new attribute as seen in the next cell's output.

In [16]:
from seismiqb import plot_image, Horizon, HorizonMetrics, SeismicGeometry
In [17]:
import scipy
from scipy.ndimage import gaussian_filter1d
from scipy.signal import hilbert
from scipy.ndimage.filters import convolve

slide = train_data_full[380, :, :]
In [18]:
plot_image(slide, cmap='gray', colorbar=False, figsize=(12, 8), title='Original slide')

hilbert_scipy = hilbert(slide, axis=1)

plot_image(hilbert_scipy.imag, cmap='gray', colorbar=False, figsize=(12, 8), title='Transformed slide')