Getting Started with Lingua Franca Translation

In this puzzle, we've to translate to english from crowd-talk lanugage. There are multiple ways to build the language translator:

Using Dictionary and Mapping
Using LSTM
Using Transformers

In this starter notebook, we'll go with dictionary and mapping. Here We'll create dictionary of words for both english and corwd-talk language.

Download the files 💾¶

Download AIcrowd CLI¶

We will first install aicrowd-cli which will help you download and later make submission directly via the notebook.

In [ ]:

%%capture
!pip install aicrowd-cli
%load_ext aicrowd.magic

In [ ]:

%aicrowd login

Please login here: https://api.aicrowd.com/auth/NPz72ux6cPJoh9ZbLHQWW3v_BO3gSIlOlqpxPVjWbjo
API Key valid
Saved API Key successfully!

Download Dataset¶

We will create a folder name data and download the files there.

In [ ]:

!rm -rf data
!mkdir data
%aicrowd ds dl -c lingua-franca-translation -o data

Importing Necessary Libraries¶

In [1]:

import os
import pandas as pd
import gensim
from sklearn.metrics.pairwise import cosine_similarity

Diving in the dataset:¶

In [2]:

train_df = pd.read_csv("data/train.csv")

In [3]:

train_df

Out[3]:

	id	crowdtalk	english
0	31989	wraov driourth wreury hyuirf schneiald chix lo...	upon this ladder one of them mounted
1	29884	treuns schleangly kriaors draotz pfiews schlio...	and solicited at the court of Augustus to be p...
2	26126	toirts choolt chiugy knusm squiend sriohl gheold	but how am I sunk!
3	44183	schlioncy yoik yahoos dynuewn maery schlioncy ...	the Yahoos draw home the sheaves in carriages
4	19108	treuns schleangly tsiens mcgaantz schmeecks tr...	and placed his hated hands before my eyes
...	...	...	...
11950	50106	hydriaond cieurry mcdaabs swiings schlioncy yo...	about five hundred leagues to the east
11951	14786	treuns schleangly criaody treuns schleangly wr...	) and two and a half in breadth
11952	16903	toirts choolt cycluierg triild schuony hypuids...	“But my toils now drew near a close
11953	68451	toantz spluiey gheuck schoutch spluiey gheuck ...	going as soon as I was dressed to pay my atten...
11954	30895	shriedy hyoirds splauetch sooc kniousts schlai...	for there was no sign of any violence except t...

11955 rows × 3 columns

In [4]:

english = train_df.english.values
crowdtalk = train_df.crowdtalk.values

In [5]:

english

Out[5]:

array(['upon this ladder one of them mounted',
       'and solicited at the court of Augustus to be preferred to a greater ship',
       'but how am I sunk!', ..., '“But my toils now drew near a close',
       'going as soon as I was dressed to pay my attendance upon his honour',
       'for there was no sign of any violence except the black mark of fingers on his neck.'],
      dtype=object)

In [6]:

processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in english]
#eng_word_list = [word for words in processedLines for word in words]

eng_word_list = [word[0] for word in processedLines ]  # only 1-th words (Bleu = 0.080)  !!!

In [7]:

processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]
#crowdtalk_word_list = [word for words in processedLines for word in words]

crowdtalk_word_list = [word[0] for word in processedLines]  # only 1-th words (Bleu = 0.080)  !!!

In [8]:

dict1 = dict(zip(crowdtalk_word_list, eng_word_list))

Prediction Phase ✈¶

In [9]:

test_df = pd.read_csv("data/test.csv")

In [10]:

test_df.crowdtalk[3984]

Out[10]:

'zoetz treiahl typeauty squiend sriohl daonts schloors rhiuny'

In [11]:

crowdtalk = test_df.crowdtalk.values

In [12]:

processedLines = [gensim.utils.simple_preprocess(sentence) for sentence in crowdtalk]

Creating sentences by matching english word corresponding the new langauge word in the sentence using the dictionary mapping created.

In [13]:

sentence = []

for i in processedLines:
  sentence_part = []
  word = ''
  for k, j in enumerate(i):
    if j in dict1:
      word = ''.join(dict1[j])
    else:
      word = ''.join(' ')
    sentence_part.append(word)
    temp = ' '.join(sentence_part)
  sentence.append(temp)

In [14]:

test_df['prediction'] = sentence

In [15]:

test_df

Out[15]:

	id	crowdtalk	prediction
0	27226	treuns schleangly throuys praests qeipp cyclui...	and my of
1	31034	feosch treuns schleangly gliath spluiey gheuck...	scared and as was only
2	35270	scraocs knaedly squiend sriohl clield whaioght...	when only found on my
3	23380	sqaups schlioncy yoik gnoirk cziourk schnaunk ...	according the to he had given
4	92117	schlioncy yoik psycheiancy mcountz pously mcna...	the very that
...	...	...	...
3980	22854	scraocs knaedly daioc mceab spriaonn schmeips ...	when it did not rain
3981	24201	toirts choolt blointly spriaonn schmeips krous...	but she did not
3982	33494	scraocs knaedly daioc mceab sooc kniousts clie...	when it was found could only neither...
3983	28988	czogy stoorty wheians veurg mcmoorth dwiountz ...	by which they
3984	25337	zoetz treiahl typeauty squiend sriohl daonts s...	till could only reach

3985 rows × 3 columns

Saving the prediction in the asset directory with the same as submission.csv.

In [16]:

!rm -rf assets
!mkdir assets
test_df.to_csv(os.path.join("assets", "submission.csv"), index=False)

"rm" Ґ пў«пҐвбп ўгваҐҐ© Ё«Ё ўҐиҐ©
Є®¬ ¤®©, ЁбЇ®«пҐ¬®© Їа®Ја ¬¬®© Ё«Ё Ї ЄҐвл¬ д ©«®¬.

Submitting our Predictions¶

Note : Please save the notebook before submitting it (Ctrl + S)

In [ ]:

%aicrowd notebook submit -c lingua-franca-translation -a assets --no-verify

Using notebook: getting-started-notebook-for-lingua-franca-transalation.ipynb for submission...
Scrubbing API keys from the notebook...
Collecting notebook...

                                                       ╭─────────────────────────╮                                                       
                                                       │ Successfully submitted! │                                                       
                                                       ╰─────────────────────────╯

                                                             Important links                                                             
┌──────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│  This submission │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/submissions/169598              │
│                  │                                                                                                                    │
│  All submissions │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/submissions?my_submissions=true │
│                  │                                                                                                                    │
│      Leaderboard │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation/leaderboards                    │
│                  │                                                                                                                    │
│ Discussion forum │ https://discourse.aicrowd.com/c/ai-blitz-xii                                                                       │
│                  │                                                                                                                    │
│   Challenge page │ https://www.aicrowd.com/challenges/ai-blitz-xii/problems/lingua-franca-translation                                 │
└──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Lingua Franca Translation

Modified Getting Started Notebook for Lingua Franca Transala

Getting Started with Lingua Franca Translation

Download the files 💾¶

Download AIcrowd CLI¶

Download Dataset¶

Importing Necessary Libraries¶

Diving in the dataset:¶

Prediction Phase ✈¶

Submitting our Predictions¶

Content

Comments

Lingua Franca Translation

Modified Getting Started Notebook for Lingua Franca Transala

Getting Started with Lingua Franca Translation

Download the files 💾¶

Download AIcrowd CLI¶

Login to AIcrowd ㊗¶

Download Dataset¶

Importing Necessary Libraries¶

Diving in the dataset:¶

Prediction Phase ✈¶

Submitting our Predictions¶

Content