QuestionQuestion

I have a file that contains 4 columns and number of sentences separated by empty lines as follows:
========
1 mi spa DET
2 entonces spa ADV
3 ahora spa ADV
4 you eng PRON
5 want eng VERB
6 to eng PART
7 speak eng VERB
8 Spanish eng NOUN
9 ! punct PUNCT

1 because eng SCONJ
2 it eng PRON
3 does eng AUX
4 n't eng PART
5 ... punct PUNCT
6 eso spa PRON
7 no spa ADV
8 importa spa VERB
9 . punct PUNCT
========

The task is that developing a script that do the following:

1 - Split the file into two files based on the language-id (the third column). The output of this script should be the following:

File-1 for Spanish language should contain:

1 mi spa DET
2 entonces spa ADV
3 ahora spa ADV

6 eso spa PRON
7 no spa ADV
8 importa spa VERB
9 . punct PUNCT

File-2 for English language should contain:

4 you eng PRON
5 want eng VERB
6 to eng PART
7 speak eng VERB
8 Spanish eng NOUN
9 ! punct PUNCT

1 because eng SCONJ
2 it eng PRON
3 does eng AUX
4 n't eng PART
5 ... punct PUNCT

2 - The script should have an option that allows reconstructing the original input file using the two generated files (file-1 and file-2). So, you should keep track of the number of the sentence (you may add another column to assign an index for each sentence), the chunk number, and the index of the word in the sentence.

3 - The script also should have an option that allows translating the words from one language to another language using Google Translate API. For example, if I want to translate the whole file (the original file)) to English, the script should replace words in the Spanish by their equivalents in English. To do so, the script should perform the translation on word level (one word at a time). This will preserve the alignment between the columns. Translating chunks may cause some issues as the number of translated words may not match with the number of input words.

================
Some special cases, if you have a case like the following one:

word1 lang-1
word2 lang-1
word3 lang-2
word4 lang-1
word5 lang-1
word6 lang-2
word7 lang-2

I’m expecting the output to be like this:

File-1 for lang1 language should contain:

word1 lang-1
word2 lang-1

word4 lang-1
word5 lang-1

File-2 for lang2 language should contain:

word3 lang-2

word6 lang-2
word7 lang-2

Note: if there is no translation generated by Google Translate API, then keep the original word.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

import os
from googletrans import Translator
import time

def split_the_file(file="text.txt"):
    # the function that splits the orignal file based on the language
   
    # read the original file
    file = open("text.txt", encoding="utf-8")
    text = file.readlines()
    file.close()
   
    eng_lang = []
    spa_lang = []
   
    # iterate over all words in the text and append them into
    # eng_list or spa_list depending on language-id
   
    for word in text:
       if word!="\n":
            if word.split("\t")[2]!="spa":
                eng_lang.append(word)
            if word.split("\t")[2]!="eng":
                spa_lang.append(word)
       else:
            eng_lang.append("\n")
            spa_lang.append("\n")...

By purchasing this solution you'll be able to access the following files:
Solution.zip.

50% discount

Hours
Minutes
Seconds
$35.00 $17.50
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats