handling arrays with awk

Hi,

I have an issue that I am trying to resolve using arrays in awk. I have two files, the first one is a dictionary with this format:

FILE 1 (dictionary)

'Abrir' 'Open'
'Aceptar' 'Accept'

Every line has two fields, a word in two languages.

The second file is a simple list of words, that can be in either of the languages of the dictionary or not present in the dictionary.

FILE 2 (wordlist)

'Open'
'Aceptar'
'Absoluto'
...

I need to split this second file in three parts:

  • words in dictionary, first language
  • words in dictionary, second language
  • words not in dictionary

What I have done with awk:

BEGIN {
dictfile=ARGV[1];
listfile=ARGV[2];
}
if  (FILENAME == dictfile) {
dic[$1] = $2;
tran[$2];
}
else {  #FILENAME = listfile;
if ($1 in dic) {
print "word in dictionary", $1, dic[$1];
}
else { 
if ($1 in tran) {
print "word already translated", $1;
}
else {
print "word not in dictionary", $1;
}
}
}

The problem comes with the case of words already translated. Seems like the array "tran" is not properly constructed, and I can�t fix it.

I know the code is quite messy, my apologies. I will thank any help and/or suggestions

In that one post, I think you referred to the same array by three different names.

tran, trans, or tras? You need to pick one and stick with it. :wink:

There may also be some issue with regard to the structure of the nested if/else statements, but without code tags to preserve indentation (assuming it was there to begin with), it's a pain to read.

Regards,
Alister

Is this what you're looking for?

awk '
NR==FNR{a[$1];next}
$1 in a {print $1 > "InDictionary"; next}
$2 in a {print $2 > "Translated"; next}
{print > "NotInDirectionary"}
' file2 file1

That was a nice and simple solution, just switching the order of the input files, thanks a lot Franklin52.

Actually the original code worked as well, in a different environment (seems that there was some problem with the machine local configuration). But the suggested solution is much more ellegant and efficient.

Sorry for the lack of code tag and indentation in the first post