Convert values in an array using a loop

Geneanalyst · February 19, 2018, 11:47pm

I have a headerless array of 1000 columns x 100000 rows. The array only contains 4 values; 0/0 , 0/1 , 1/1 , ./.

Here I am showing the 1st 3 rows and columns of the input array

0/0    0/0    1/1
0/1    0/1    0/1
0/0    ./.    0/0
0/0    0/0    0/0

I would like to convert the values in the array as follows and print them to a new array:

0/0 --> 0
0/1 --> 1
1/1 --> 2
./. --> 9

What would be an efficient way to do this using AWK. I am guessing a loop that loops through all columns and changes the values as shown above.

The output array should look like:

0    0    2
1    1    1
0    9    0
0    0    0

Thanks in advance.

Don_Cragun · February 20, 2018, 1:19am

As with any thread posted in this forum, there is information that we need to be able to suggest how you might want to proceed:

What operating system are you using?
What shell are you using?
In what form is this leaderless array that you have? (Is it in a shell array variable? Is it hard-coded into an awk BEGIN clause? Is it in a file?)
What is the name of this shell array variable, awk array variable, file, or whatever?
In what form do you want the output array to be produced and what name should it have?
And, most importantly, what have you tried to solve this problem on your own?

Aia · February 20, 2018, 2:26am

Some python3?

conversion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
    for line in read_f:
        columns = [conversion_table[f] for f in line.split()]
        print(*columns, sep='    ')

Save as conv.py
Run as python3 conv.py

Geneanalyst · February 20, 2018, 5:41am

aia:

Some python3?

conversion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
   for line in read_f:
   columns = [conversion_table[f] for f in line.split()]
   print(*columns, sep='    ')

Save as conv.py
Run as python3 conv.py

Works great. How would you save the output to a text file, Output.txt instead of printing to screen since this is a huge array?

I think I figured it out by changing the last line in your code to:

print(*columns, sep='    ', file=open("output.txt", "a"))

Any problems with doing this?

---------- Post updated at 05:41 AM ---------- Previous update was at 05:38 AM ----------

Sorry Don. Here is the info:

Ubuntu 16.04 LTS
Unix
Input is a text file, Input.txt
Output should be a text file, Output.txt
I have tried If Else statments in AWK

RudiC · February 20, 2018, 6:15am

Try also

awk '
BEGIN   {for (n=split("0/0 0/1 1/1 \\./.", TMP); n; n--) CAT[TMP[n]] = n-1
        }
        {for (c in CAT) gsub (c, CAT[c])
         gsub (3, 9)
        }
1 
' file
0    0    2
1    1    1
0    9    0
0    0    0

Redirect stdout to Output.Txt if happy with the result.

rdrtx1 · February 20, 2018, 8:23am

awk -v values="0/0:0,0/1:1,1/1:2,./.:9" '
BEGIN {
   n=split(values,arr,"[:,]");
}
{  for (i=1; i<=n; i+=2) gsub(arr, arr[i+1]);
   print $0;
}
' infile

Geneanalyst · February 20, 2018, 9:30am

Thanks to everyone. All 3 scripts work great !

RudiC · February 20, 2018, 9:32am

@rdrtx1: nice! works!

But: the ./.:9 only works because it's the last value pair operated upon, gsub bing each and every entry that has not been changed before, as it's using two . wild cards. Escape at least one of them to be on the safe side.

rdrtx1 · February 20, 2018, 9:43am

Not sure if its actually a dot or any value not previously matched gets a nine. Escape or not to escape. That is the question. Use as needed.

Aia · February 20, 2018, 11:54am

geneanalyst:

Works great. How would you save the output to a text file, Output.txt instead of printing to screen since this is a huge array?

I think I figured it out by changing the last line in your code to:
print(*columns, sep='    ', file=open("output.txt", "a"))
Any problems with doing this?

The issue with that is that the file output.txt is opened on every row and appended to it.

Another way would be to use the shell redirection of stdout

python3 conv.py > output.txt

From the program itself it could be like:

convertion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
    with open('output.txt', 'w') as write_f:
        for line in read_f:
            columns = [convertion_table[f] for f in line.split()]
            row = '    '.join([str(n) for n in  columns]) + '\n'
            write_f.write(row)