Starter Code for Sentiment Classification

In this baseline we will be training an sklearn model to do a multi-class classificattion of sentiment from face embeddings.

Downloading Dataset¶

Installing puzzle datasets via aicrowd-cli

In [ ]:

!pip install aicrowd-cli

# Make sure to re-run below code whenever you restart colab notebook
%load_ext aicrowd.magic

Collecting aicrowd-cli
  Downloading aicrowd_cli-0.1.10-py3-none-any.whl (44 kB)
     |████████████████████████████████| 44 kB 1.2 MB/s 
Collecting requests<3,>=2.25.1
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 1.1 MB/s 
Requirement already satisfied: toml<1,>=0.10.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (0.10.2)
Collecting requests-toolbelt<1,>=0.9.1
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
     |████████████████████████████████| 54 kB 1.8 MB/s 
Collecting pyzmq==22.1.0
  Downloading pyzmq-22.1.0-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 34.2 MB/s 
Requirement already satisfied: click<8,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: tqdm<5,>=4.56.0 in /usr/local/lib/python3.7/dist-packages (from aicrowd-cli) (4.62.3)
Collecting GitPython==3.1.18
  Downloading GitPython-3.1.18-py3-none-any.whl (170 kB)
     |████████████████████████████████| 170 kB 42.5 MB/s 
Collecting rich<11,>=10.0.0
  Downloading rich-10.16.2-py3-none-any.whl (214 kB)
     |████████████████████████████████| 214 kB 45.1 MB/s 
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.9-py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 1.5 MB/s 
Requirement already satisfied: typing-extensions>=3.7.4.0 in /usr/local/lib/python3.7/dist-packages (from GitPython==3.1.18->aicrowd-cli) (3.10.0.2)
Collecting smmap<6,>=3.0.1
  Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.10)
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     |████████████████████████████████| 51 kB 4.9 MB/s 
Collecting colorama<0.5.0,>=0.4.0
  Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.7/dist-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.6.1)
Installing collected packages: smmap, requests, gitdb, commonmark, colorama, rich, requests-toolbelt, pyzmq, GitPython, aicrowd-cli
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: pyzmq
    Found existing installation: pyzmq 22.3.0
    Uninstalling pyzmq-22.3.0:
      Successfully uninstalled pyzmq-22.3.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed GitPython-3.1.18 aicrowd-cli-0.1.10 colorama-0.4.4 commonmark-0.9.1 gitdb-4.0.9 pyzmq-22.1.0 requests-2.27.1 requests-toolbelt-0.9.1 rich-10.16.2 smmap-5.0.0

In [ ]:

# Logging in from our AIcrowd account. Make sure you have accepted the puzzle rules before logging in!  

%aicrowd login

Please login here: https://api.aicrowd.com/auth/Ccc71z-PcuOkgECKjwpao6gTSxwtBWr9hLN8kqaGynw
API Key valid
Saved API Key successfully!

In [ ]:

# Creating a new data directory and downloading the dataset 

!rm -rf data
!mkdir data
%aicrowd ds dl -c sentiment-classification -o data

Importing Libraries¶

In this baseline, we will be sing sklearn RandomForestClassifier to classify the sentiment of face embeddings.

In [ ]:

import pandas as pd
import os
import numpy as np
from ast import literal_eval
import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, accuracy_score

random.seed(42)

Reading Dataset¶

As mented in the challenge readme, we have three different sets provided - train, validation and test respectively.

In [ ]:

# Readging the csv 

train = pd.read_csv("data/train.csv")
val = pd.read_csv("data/val.csv")
submission = pd.read_csv("data/sample_submission.csv")

train

Out[ ]:

	embeddings	label
0	[0.3206779360771179, 0.988215982913971, 1.0441...	positive
1	[0.05074610561132431, 1.0742985010147095, 0.60...	negative
2	[0.41962647438049316, 0.4505457878112793, 1.39...	negative
3	[0.4361684024333954, 0.19191382825374603, 0.83...	positive
4	[0.6382085084915161, 0.8352395296096802, 0.393...	neutral
...	...	...
4995	[2.2057647705078125, 1.1072001457214355, 0.435...	neutral
4996	[0.6344252228736877, 1.164398193359375, 0.7155...	negative
4997	[0.9160683155059814, 0.39996421337127686, 0.82...	negative
4998	[0.006456990726292133, 0.18667978048324585, 0....	positive
4999	[1.337027668952942, 0.8853631615638733, 0.6706...	negative

5000 rows × 2 columns

In [ ]:

# Getting the feature and labels from each set. 


X = [literal_eval(embedding)  for embedding in train['embeddings'].values]
y = train['label'].values

X_val = [literal_eval(embedding)  for embedding in val['embeddings'].values]
y_val = val['label'].values

Training the model¶

Here, we will be training our model using the training set.

In [ ]:

model = RandomForestClassifier()
model

Out[ ]:

RandomForestClassifier()

In [ ]:

model.fit(X, y)

Out[ ]:

RandomForestClassifier()

Testing the Model¶

Here, we will be evaluator our model using validation set

In [ ]:

y_pred = model.predict(X_val)

print(f"F1 Score : {f1_score(y_val, y_pred, average='weighted')}")
print(f"Accuracy Score : {accuracy_score(y_val, y_pred)}")

F1 Score : 0.6795114432460195
Accuracy Score : 0.685

Generating the Predictions¶

Generating Predictions from test data to make submission in the puzzle.

In [ ]:

submission_embeddings = [literal_eval(embedding)  for embedding in submission['embeddings'].values]

predictions = model.predict(submission_embeddings)
predictions.shape

Out[ ]:

(3001,)

In [ ]:

submission['label'] = predictions
submission

Out[ ]:

	embeddings	label
0	[0.08109518140554428, 0.3090009093284607, 1.36...	positive
1	[0.6809610724449158, 1.1909409761428833, 0.892...	neutral
2	[0.14851869642734528, 0.7872061133384705, 0.89...	neutral
3	[0.44697386026382446, 0.36429283022880554, 0.7...	neutral
4	[1.8009324073791504, 0.26081395149230957, 0.40...	negative
...	...	...
2996	[0.9138844609260559, 0.9460961222648621, 0.571...	negative
2997	[0.7667452096939087, 0.7896291613578796, 0.648...	negative
2998	[0.8158280849456787, 2.404792070388794, 0.9924...	neutral
2999	[0.4161085784435272, 0.3146701455116272, 1.139...	positive
3000	[0.7037264108657837, 0.6421875357627869, 1.215...	negative

3001 rows × 2 columns

Saving the Predictions¶

In [ ]:

# Saving the predictions
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"))

Submitting our Predictions¶

In [ ]:

%aicrowd notebook submit -c sentiment-classification -a assets --no-verify

Using notebook: [Baseline] Sentiment Classification for submission...
Removing existing files from submission directory...
Scrubbing API keys from the notebook...
Collecting notebook...

                                                       ╭─────────────────────────╮                                                       
                                                       │ Successfully submitted! │                                                       
                                                       ╰─────────────────────────╯

                                                             Important links                                                             
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-xiii/problems/sentiment-classification/submissions/172578              │
│                  │                                                                                                                    │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-xiii/problems/sentiment-classification/submissions?my_submissions=true │
│                  │                                                                                                                    │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-xiii/problems/sentiment-classification/leaderboards                    │
│                  │                                                                                                                    │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-xiii                                                                      │
│                  │                                                                                                                    │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-xiii/problems/sentiment-classification                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Congratulations to making your first submission in the puzzle 🎉 . Let's continue with the journey by improving the baseline & making submission! Don't be shy to ask question related to any errors you are getting or doubts in any part of this notebook in discussion forum or in AIcrowd Discord sever, AIcrew will be happy to help you :)

Have a cool new idea that you want to see in the next blitz ? Let us know!

Sentiment Classification

[ Baseline ] Sentiment Classification

Starter Code for Sentiment Classification

Downloading Dataset¶

Importing Libraries¶

Reading Dataset¶

Training the model¶

Testing the Model¶

Generating the Predictions¶

Saving the Predictions¶

Submitting our Predictions¶

Content

Comments