Age Prediction
[ Baseline ] Age Prediction
Baseline for Age Prediction,we will be using random forest classfier on image pixel to predict age.
We are going to use a very naive approach here.
We will be reducing the number of pixels, take them out in a list, and then use a random forest classifier model.
Getting Started with Age Prediction
In this puzzle, we have to predict the age from the given human faces.
This is a starter kit explaining how to download the data and also submit direcly via this notebook.
In this baseline, we are going to reduce the number of pixels, take them out in a list, and are going to use a random forest classifier model.
!pip install aicrowd-cli
%load_ext aicrowd.magic
Login to AIcrowd ㊗¶¶
%aicrowd login
Download Dataset¶¶
We will create a folder name data and download the files there.
import os
os.getcwd()
!mkdir data
%aicrowd ds dl -c age-prediction -o data
!unzip data/train.zip -d data/train > /dev/null
!unzip data/val.zip -d data/val > /dev/null
!unzip data/test.zip -d data/test > /dev/null
Importing Libraries:¶
import pandas as pd
import numpy as np
import os
from PIL import Image
from tqdm import tqdm
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
Diving in the dataset 🕵️♂️¶
train_df = pd.read_csv("data/train.csv")
val_df = pd.read_csv("data/val.csv")
test_df = pd.read_csv("data/sample_submission.csv")
print(train_df.head(3))
The number of datapoints in train is 4000 The number of datapoints in valid is 2000 The number of datapoints in test is 3000
print(train_df.shape[0])
print(val_df.shape[0])
print(test_df.shape[0])
The target labels are '0-10' to '90-100'. So there are 10 target labels.
train_df.age.unique()
train_df['ImageID'][0]
Modeling¶
We are going to use a very naive approach here.
We will be reducing the number of pixels, take them out in a list, and then use a random forest classifier model.
def preprocessor(image_path,dataframe):
# Go through each test image
imgdatas = []
for i in tqdm(range(dataframe.shape[0]), total = len(dataframe)):
# Reading the test image
imgdata = Image.open(os.path.join(image_path, dataframe['ImageID'][i]+'.jpg'))
#Convert to grayscale
imgdata = imgdata.convert('L')
#Reshapes the image to a fix sahpe -> 190×190(You can choose any shape)
imgdata = imgdata.resize((190,190))
imgdata =np.asarray(imgdata)
#Squeezes the matrix for feeding the value to model
imgdata = np.squeeze(imgdata[10,:])
imgdatas.append(imgdata)
# image_ids.append(test_imgs[i].split(".")[0])
dataframe['imgData'] = imgdatas
return dataframe
base_path = 'data'
preprocessor(os.path.join(base_path,'train'), train_df)
preprocessor(os.path.join(base_path,'test'), test_df)
preprocessor(os.path.join(base_path,'val'), val_df)
train_df['imgData'][12].shape
train_x = train_df.imgData
train_y = train_df.age
age_predictor = RandomForestClassifier(max_features=0.15, random_state=2)
age_predictor.fit(list(train_x),train_y)
print(age_predictor.score(list(train_x),train_y))
val_x = val_df.imgData
val_y = val_df.age
val_predict = age_predictor.predict(list(val_x))
print(f1_score(val_predict,val_y,average='weighted'))
Generating Prediction File¶
Now that we have created the baseline prediction, lets submit it.
test_x = test_df.imgData
test_predict = age_predictor.predict(list(test_x))
submission = pd.read_csv('data/sample_submission.csv')
submission['age'] = test_predict
!rm -rf assets
!mkdir assets
submission.to_csv(os.path.join("assets", "submission.csv"))
Submitting our Predictions¶
Note : Please save the notebook before submitting it (Ctrl + S)
%aicrowd notebook submit -c age-prediction -a assets --no-verify
Content
Comments
You must login before you can post a comment.