Lingua Franca Translation
Modified Getting Started Notebook for Lingua Franca Transala
A getting started notebook for the challenge.
Create dict1 for 1-th word of sentences.
Bleu = 0.080
Getting Started with Lingua Franca Translation
In this puzzle, we've to translate to english from crowd-talk lanugage. There are multiple ways to build the language translator:
- Using Dictionary and Mapping
- Using LSTM
- Using Transformers
In this starter notebook, we'll go with dictionary and mapping. Here We'll create dictionary of words for both english and corwd-talk language.
%%capture
!pip install aicrowd-cli
%load_ext aicrowd.magic
Login to AIcrowd ㊗¶
%aicrowd login
Download Dataset¶
We will create a folder name data and download the files there.
!rm -rf data
!mkdir data
%aicrowd ds dl -c lingua-franca-translation -o data
Importing Necessary Libraries¶
import os
import pandas as pd
import gensim
from sklearn.metrics.pairwise import cosine_similarity
Diving in the dataset:¶
train_df = pd.read_csv("data/train.csv")
train_df
english = train_df.english.values
crowdtalk = train_df.crowdtalk.values
english
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in english]
#eng_word_list = [word for words in processedLines for word in words]
eng_word_list = [word[0] for word in processedLines ] # only 1-th words (Bleu = 0.080) !!!
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
#crowdtalk_word_list = [word for words in processedLines for word in words]
crowdtalk_word_list = [word[0] for word in processedLines] # only 1-th words (Bleu = 0.080) !!!
dict1 = dict(zip(crowdtalk_word_list, eng_word_list))
Prediction Phase ✈¶
test_df = pd.read_csv("data/test.csv")
test_df.crowdtalk[3984]
crowdtalk = test_df.crowdtalk.values
processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
Creating sentences by matching english word corresponding the new langauge word in the sentence using the dictionary mapping created.
sentence = []
for i in processedLines:
sentence_part = []
word = ''
for k, j in enumerate(i):
if j in dict1:
word = ''.join(dict1[j])
else:
word = ''.join(' ')
sentence_part.append(word)
temp = ' '.join(sentence_part)
sentence.append(temp)
test_df['prediction'] = sentence
test_df
Saving the prediction in the asset directory with the same as submission.csv.
!rm -rf assets
!mkdir assets
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)
Submitting our Predictions¶
Note : Please save the notebook before submitting it (Ctrl + S)
%aicrowd notebook submit -c lingua-franca-translation -a assets --no-verify
Content
Comments
You must login before you can post a comment.