Lingua Franca Translation
Getting Started Notebook for Lingua Franca Transalation
A getting started notebook for the challenge.
Getting Started with Lingua Franca Translation
In this puzzle, we've to translate to english from crowd-talk lanugage. There are multiple ways to build the language translator:
- Using Dictionary and Mapping
- Using LSTM
- Using Transformers
In this starter notebook, we'll go with dictionary and mapping. Here We'll create dictionary of words for both english and corwd-talk language.
In [ ]:
%%capture
!pip install aicrowd-cli
%load_ext aicrowd.magic
Login to AIcrowd ㊗¶
In [ ]:
%aicrowd login
Download Dataset¶
We will create a folder name data and download the files there.
In [ ]:
!rm -rf data
!mkdir data
%aicrowd ds dl -c lingua-franca-translation -o data
Importing Necessary Libraries¶
In [ ]:
import os
import pandas as pd
import gensim
from sklearn.metrics.pairwise import cosine_similarity
Diving in the dataset:¶
In [ ]:
train_df = pd.read_csv("data/train.csv")
In [ ]:
train_df.head()
Out[ ]:
In [ ]:
english = train_df.english.values
crowdtalk = train_df.crowdtalk.values
In [ ]:
english
Out[ ]:
In [ ]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in english]
eng_word_list = [word for words in processedLines for word in words]
In [ ]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
crowdtalk_word_list = [word for words in processedLines for word in words]
In [ ]:
dict1 = dict(zip(crowdtalk_word_list, eng_word_list))
Prediction Phase ✈¶
In [ ]:
test_df = pd.read_csv("data/test.csv")
In [ ]:
test_df.crowdtalk[3984]
Out[ ]:
In [ ]:
crowdtalk = test_df.crowdtalk.values
In [ ]:
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
Creating sentences by matching english word corresponding the new langauge word in the sentence using the dictionary mapping created.
In [ ]:
sentence = []
for i in processedLines:
sentence_part = []
word = ''
for j in i:
if j in dict1:
word = ''.join(dict1[j])
else:
word = ''.join(' ')
sentence_part.append(word)
temp = ' '.join(sentence_part)
sentence.append(temp)
In [ ]:
test_df['prediction'] = sentence
In [ ]:
test_df.head()
Out[ ]:
Saving the prediction in the asset directory with the same as submission.csv.
In [ ]:
!rm -rf assets
!mkdir assets
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)
Submitting our Predictions¶
Note : Please save the notebook before submitting it (Ctrl + S)
In [ ]:
%aicrowd notebook submit -c lingua-franca-translation -a assets --no-verify
In [ ]:
Content
Comments
You must login before you can post a comment.