handling arrays with awk

gmartinez · May 24, 2010, 1:33pm

Hi,

I have an issue that I am trying to resolve using arrays in awk. I have two files, the first one is a dictionary with this format:

FILE 1 (dictionary)

'Abrir' 'Open'
'Aceptar' 'Accept'

Every line has two fields, a word in two languages.

The second file is a simple list of words, that can be in either of the languages of the dictionary or not present in the dictionary.

FILE 2 (wordlist)

'Open'
'Aceptar'
'Absoluto'
...

I need to split this second file in three parts:

words in dictionary, first language
words in dictionary, second language
words not in dictionary

What I have done with awk:

BEGIN {
dictfile=ARGV[1];
listfile=ARGV[2];
}
if  (FILENAME == dictfile) {
dic[$1] = $2;
tran[$2];
}
else {  #FILENAME = listfile;
if ($1 in dic) {
print "word in dictionary", $1, dic[$1];
}
else { 
if ($1 in tran) {
print "word already translated", $1;
}
else {
print "word not in dictionary", $1;
}
}
}

The problem comes with the case of words already translated. Seems like the array "tran" is not properly constructed, and I can�t fix it.

I know the code is quite messy, my apologies. I will thank any help and/or suggestions

alister · May 24, 2010, 1:49pm

In that one post, I think you referred to the same array by three different names.

tran, trans, or tras? You need to pick one and stick with it.

There may also be some issue with regard to the structure of the nested if/else statements, but without code tags to preserve indentation (assuming it was there to begin with), it's a pain to read.

Regards,
Alister

Franklin52 · May 25, 2010, 5:09am

Is this what you're looking for?

awk '
NR==FNR{a[$1];next}
$1 in a {print $1 > "InDictionary"; next}
$2 in a {print $2 > "Translated"; next}
{print > "NotInDirectionary"}
' file2 file1

gmartinez · May 25, 2010, 6:04am

That was a nice and simple solution, just switching the order of the input files, thanks a lot Franklin52.

Actually the original code worked as well, in a different environment (seems that there was some problem with the machine local configuration). But the suggested solution is much more ellegant and efficient.

Sorry for the lack of code tag and indentation in the first post