Quick data explorations in Google Colab using R
Prepare the notebook 🛠¶
In [ ]:
cat(system('curl -sL https://gitlab.aicrowd.com/jyotish/pricing-game-notebook-scripts/raw/r-functions/r/setup.sh > setup.sh && bash setup.sh', intern=TRUE), sep='\n')
source("aicrowd_helpers.R")
TRAINING_DATA_PATH = 'training.csv'
AICROWD_API_KEY = '' # You can get the key from https://aicrowd.com/participants/me
download_aicrowd_dataset(AICROWD_API_KEY)
In [ ]:
options(width = 130)
options(warn = -1)
Packages 🗃¶
Install and require here all the packages you need to define your model.
Note: Installing packages the first time might take some time.
In [ ]:
install_packages <- function() {
install.packages("skimr")
install.packages("corrr")
install.packages("tidyverse")
}
install_packages()
In [ ]:
global_imports <- function() {
library(skimr)
library(corrr)
library(tidyverse)
}
global_imports()
Loading the data 📲¶
In [ ]:
# Load the dataset.
train_data = read_csv(TRAINING_DATA_PATH)
How does the data look like? 🔍¶
In [ ]:
skim(train_data)
In [35]:
options(width = 100)
glimpse(train_data)
Let's look at some charts!¶
In [ ]:
# remove id_policy and convert character columns to factors
train_clean <- train_data %>%
select(-id_policy) %>%
mutate(across(where(is.character), as.factor))
Categorical Variables¶
In [ ]:
train_clean %>%
keep(is.factor) %>%
gather() %>%
ggplot() +
geom_bar(mapping = aes(x=value, fill=key), color="black") +
facet_wrap(~ key, scales = "free") +
theme(legend.position = "",
plot.title.position = "plot")+
labs(title = "Categorical Variable Distributions")
Numeric Variables¶
In [ ]:
train_clean %>%
keep(is.numeric) %>%
gather() %>%
ggplot() +
geom_histogram(mapping = aes(x=value, fill=key), color="black") +
facet_wrap(~ key, scales = "free") +
scale_x_continuous(n.breaks = 2)+
theme(legend.position = "",
plot.title.position = "plot")+
labs(title = "Numeric Variable Distributions")
Correlations¶
In [ ]:
train_clean %>%
keep(is.numeric) %>%
corrr::correlate() %>%
corrr::network_plot(min_cor = 0.2)
In [ ]:
train_clean %>%
keep(is.numeric) %>%
corrr::correlate() %>%
corrr::rearrange() %>%
corrr::shave() %>%
corrr::fashion()
Content
Comments
You must login before you can post a comment.