Convert values in an array using a loop

I have a headerless array of 1000 columns x 100000 rows. The array only contains 4 values; 0/0 , 0/1 , 1/1 , ./.

Here I am showing the 1st 3 rows and columns of the input array

0/0    0/0    1/1
0/1    0/1    0/1
0/0    ./.    0/0
0/0    0/0    0/0

I would like to convert the values in the array as follows and print them to a new array:

0/0 --> 0
0/1 --> 1
1/1 --> 2
./. --> 9

What would be an efficient way to do this using AWK. I am guessing a loop that loops through all columns and changes the values as shown above.

The output array should look like:

0    0    2
1    1    1
0    9    0
0    0    0

Thanks in advance.

As with any thread posted in this forum, there is information that we need to be able to suggest how you might want to proceed:

  1. What operating system are you using?
  2. What shell are you using?
  3. In what form is this leaderless array that you have? (Is it in a shell array variable? Is it hard-coded into an awk BEGIN clause? Is it in a file?)
  4. What is the name of this shell array variable, awk array variable, file, or whatever?
  5. In what form do you want the output array to be produced and what name should it have?
  6. And, most importantly, what have you tried to solve this problem on your own?

Some python3?

conversion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
    for line in read_f:
        columns = [conversion_table[f] for f in line.split()]
        print(*columns, sep='    ')

Save as conv.py
Run as python3 conv.py

1 Like

Works great. How would you save the output to a text file, Output.txt instead of printing to screen since this is a huge array?

I think I figured it out by changing the last line in your code to:

print(*columns, sep='    ', file=open("output.txt", "a"))

Any problems with doing this?

---------- Post updated at 05:41 AM ---------- Previous update was at 05:38 AM ----------

Sorry Don. Here is the info:

  1. Ubuntu 16.04 LTS
  2. Unix
  3. Input is a text file, Input.txt
  4. Output should be a text file, Output.txt
  5. I have tried If Else statments in AWK

Try also

awk '
BEGIN   {for (n=split("0/0 0/1 1/1 \\./.", TMP); n; n--) CAT[TMP[n]] = n-1
        }
        {for (c in CAT) gsub (c, CAT[c])
         gsub (3, 9)
        }
1 
' file
0    0    2
1    1    1
0    9    0
0    0    0

Redirect stdout to Output.Txt if happy with the result.

1 Like
awk -v values="0/0:0,0/1:1,1/1:2,./.:9" '
BEGIN {
   n=split(values,arr,"[:,]");
}
{  for (i=1; i<=n; i+=2) gsub(arr, arr[i+1]);
   print $0;
}
' infile
1 Like

Thanks to everyone. All 3 scripts work great !

@rdrtx1: nice! works!

But: the ./.:9 only works because it's the last value pair operated upon, gsub bing each and every entry that has not been changed before, as it's using two . wild cards. Escape at least one of them to be on the safe side.

Not sure if its actually a dot or any value not previously matched gets a nine. Escape or not to escape. That is the question. Use as needed.

The issue with that is that the file output.txt is opened on every row and appended to it.

Another way would be to use the shell redirection of stdout

python3 conv.py > output.txt

From the program itself it could be like:

convertion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
    with open('output.txt', 'w') as write_f:
        for line in read_f:
            columns = [convertion_table[f] for f in line.split()]
            row = '    '.join([str(n) for n in  columns]) + '\n'
            write_f.write(row)
1 Like