ADCLK

[Getting Started Notebook] AD Click Challange

This is a Baseline Code to get you started with the challenge.

gauransh_k

You can use this code to start understanding the data and create a baseline model for further imporvments.

Starter Code for ADCLK Practice Challange

Note : Create a copy of the notebook and use the copy for submission. Go to File > Save a Copy in Drive to create a new copy

Downloading Dataset

Installing aicrowd-cli

In [1]:
!pip install aicrowd-cli
%load_ext aicrowd.magic
Requirement already satisfied: aicrowd-cli in /home/gauransh/anaconda3/lib/python3.8/site-packages (0.1.10)
Requirement already satisfied: pyzmq==22.1.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (22.1.0)
Requirement already satisfied: rich<11,>=10.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (10.15.2)
Requirement already satisfied: tqdm<5,>=4.56.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (4.62.2)
Requirement already satisfied: GitPython==3.1.18 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (3.1.18)
Requirement already satisfied: toml<1,>=0.10.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.10.2)
Requirement already satisfied: click<8,>=7.1.2 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (7.1.2)
Requirement already satisfied: requests<3,>=2.25.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (2.26.0)
Requirement already satisfied: requests-toolbelt<1,>=0.9.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from aicrowd-cli) (0.9.1)
Requirement already satisfied: gitdb<5,>=4.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from GitPython==3.1.18->aicrowd-cli) (4.0.9)
Requirement already satisfied: smmap<6,>=3.0.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->GitPython==3.1.18->aicrowd-cli) (5.0.0)
Requirement already satisfied: idna<4,>=2.5 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (3.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2021.5.30)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (2.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from requests<3,>=2.25.1->aicrowd-cli) (1.26.6)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (2.10.0)
Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.9.1)
Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /home/gauransh/anaconda3/lib/python3.8/site-packages (from rich<11,>=10.0.0->aicrowd-cli) (0.4.4)
In [2]:
%aicrowd login
Please login here: https://api.aicrowd.com/auth/Ku7MAu_xqi81weJcT4Hr4OgTSLAxRwBJrQPn4F7zPYE
Opening in existing browser session.
API Key valid
Saved API Key successfully!
In [6]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c adclk -o data

Importing Libraries

In this baseline, we will be using skleanr library to train the model and generate the predictions

In [43]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
import os
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

Reading the dataset

Here, we will read the train.csv which contains both training samples & labels, and test.csv which contains testing samples.

In [30]:
# Reading the CSV
train_data_df = pd.read_csv("data/train.csv", encoding='ISO-8859–1')
test_data_df = pd.read_csv("data/test.csv", encoding='ISO-8859–1')

# train_data.shape, test_data.shape
display(train_data_df.head())
display(test_data_df.head())
click impression url_hash ad_id advertiser_id depth position query_id keyword_id title_id description_id user_id
0 0 1 5.660000e+18 21442160 37070 2 2 1430 4232 889814 712389 249
1 0 1 9.750000e+18 10850149 29713 2 1 2457 503 1904 2155 15084327
2 0 1 9.570000e+18 1973398 1339 2 2 2088431 675 450 750 1300996
3 0 2 1.660000e+18 21248222 2298 2 1 7151787 2167 4258 4840 0
4 0 1 2.040000e+18 21194514 34292 2 2 9133331 11660 59369 53523 3047683
impression url_hash ad_id advertiser_id depth position query_id keyword_id title_id description_id user_id
0 1 2.290000e+18 20588302 33793 2 2 4686234 5025 11882 306 2944128
1 1 1.100000e+19 20022493 591 1 1 12608028 4252 6745 6528 1170705
2 1 1.210000e+19 20183538 27961 1 1 517 4479 3066 3335 23907337
3 1 7.900000e+18 20896468 2010 2 1 839 2009 1497 70 15077755
4 1 1.740000e+19 20017080 23798 2 2 248043 14912 4 13851 3385114

Data Preprocessing

In [31]:
# Separating data from the dataframe for final training
X = train_data_df.loc[:,train_data_df.columns != "click"].to_numpy()
y = train_data_df["click"].to_numpy()
print(X.shape, y.shape)
(40000, 11) (40000,)

Splitting the data

In [32]:
# Splitting the training set, and training & validation
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
print(X_train.shape)
print(y_train.shape)
(32000, 11)
(32000,)
In [33]:
X_train[0], y_train[0]
Out[33]:
(array([7.00000e+00, 1.44000e+19, 2.91172e+06, 2.37780e+04, 2.00000e+00,
        1.00000e+00, 1.28300e+03, 3.70000e+01, 8.10000e+01, 1.07000e+02,
        9.43530e+04]),
 1)

Training the Model

In [46]:
model = GaussianNB()
model.fit(X_train, y_train)
Out[46]:
GaussianNB()

Validation

In [47]:
model.score(X_val, y_val)
Out[47]:
0.827375

So, we are done with the baseline let's test with real testing data and see how we submit it to challange.

Predictions

In [48]:
# Separating data from the dataframe for final testing
X_test = test_data_df.to_numpy()
print(X_test.shape)
(10000, 11)
In [49]:
# Predicting the labels
predictions = model.predict(X_test)
predictions.shape
Out[49]:
(10000,)
In [50]:
# Converting the predictions array into pandas dataset
submission = pd.DataFrame({"click":predictions})
submission
Out[50]:
click
0 0
1 0
2 0
3 0
4 0
... ...
9995 0
9996 0
9997 0
9998 0
9999 0

10000 rows × 1 columns

In [51]:
# Saving the pandas dataframe
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"), index=False)

Submitting our Predictions

Note : Please save the notebook before submitting it (Ctrl + S)

In [53]:
!aicrowd submission create -c adclk -f assets/submission.csv
submission.csv ━━━━━━━━━━━━━━━━━━━━ 100.0%21.7/20.0 KB494.1 kB/s0:00:00.2 kB/s • 0:00:01:--
                                  ╭─────────────────────────╮                                  
                                  │ Successfully submitted! │                                  
                                  ╰─────────────────────────╯                                  
                                        Important links                                        
┌──────────────────┬──────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/adclk/submissions/167543              │
│                  │                                                                          │
│  All submissions │ https://www.aicrowd.com/challenges/adclk/submissions?my_submissions=true │
│                  │                                                                          │
│      Leaderboard │ https://www.aicrowd.com/challenges/adclk/leaderboards                    │
│                  │                                                                          │
│ Discussion forum │ https://discourse.aicrowd.com/c/adclk                                    │
│                  │                                                                          │
│   Challenge page │ https://www.aicrowd.com/challenges/adclk                                 │
└──────────────────┴──────────────────────────────────────────────────────────────────────────┘
{'submission_id': 167543, 'created_at': '2021-12-11T15:38:36.079Z'}
In [ ]:


Comments

You must login before you can post a comment.

Execute