I have a headerless array of 1000 columns x 100000 rows. The array only contains 4 values; 0/0
, 0/1
, 1/1
, ./.
Here I am showing the 1st 3 rows and columns of the input array
0/0 0/0 1/1
0/1 0/1 0/1
0/0 ./. 0/0
0/0 0/0 0/0
I would like to convert the values in the array as follows and print them to a new array:
0/0 --> 0
0/1 --> 1
1/1 --> 2
./. --> 9
What would be an efficient way to do this using AWK. I am guessing a loop that loops through all columns and changes the values as shown above.
The output array should look like:
0 0 2
1 1 1
0 9 0
0 0 0
Thanks in advance.
As with any thread posted in this forum, there is information that we need to be able to suggest how you might want to proceed:
What operating system are you using?
What shell are you using?
In what form is this leaderless array that you have? (Is it in a shell array variable? Is it hard-coded into an awk BEGIN
clause? Is it in a file?)
What is the name of this shell array variable, awk
array variable, file, or whatever?
In what form do you want the output array to be produced and what name should it have?
And, most importantly, what have you tried to solve this problem on your own?
Aia
February 20, 2018, 2:26am
3
Some python3?
conversion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
for line in read_f:
columns = [conversion_table[f] for f in line.split()]
print(*columns, sep=' ')
Save as conv.py
Run as python3 conv.py
1 Like
aia:
Some python3?
conversion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
for line in read_f:
columns = [conversion_table[f] for f in line.split()]
print(*columns, sep=' ')
Save as conv.py
Run as python3 conv.py
Works great. How would you save the output to a text file, Output.txt instead of printing to screen since this is a huge array?
I think I figured it out by changing the last line in your code to:
print(*columns, sep=' ', file=open("output.txt", "a"))
Any problems with doing this?
---------- Post updated at 05:41 AM ---------- Previous update was at 05:38 AM ----------
Sorry Don. Here is the info:
Ubuntu 16.04 LTS
Unix
Input is a text file, Input.txt
Output should be a text file, Output.txt
I have tried If Else statments in AWK
RudiC
February 20, 2018, 6:15am
5
Try also
awk '
BEGIN {for (n=split("0/0 0/1 1/1 \\./.", TMP); n; n--) CAT[TMP[n]] = n-1
}
{for (c in CAT) gsub (c, CAT[c])
gsub (3, 9)
}
1
' file
0 0 2
1 1 1
0 9 0
0 0 0
Redirect stdout to Output.Txt
if happy with the result.
1 Like
rdrtx1
February 20, 2018, 8:23am
6
awk -v values="0/0:0,0/1:1,1/1:2,./.:9" '
BEGIN {
n=split(values,arr,"[:,]");
}
{ for (i=1; i<=n; i+=2) gsub(arr, arr[i+1]);
print $0;
}
' infile
1 Like
Thanks to everyone. All 3 scripts work great !
RudiC
February 20, 2018, 9:32am
8
@rdrtx1 : nice! works!
But: the ./.:9
only works because it's the last value pair operated upon, gsub
bing each and every entry that has not been changed before, as it's using two .
wild cards. Escape at least one of them to be on the safe side.
rdrtx1
February 20, 2018, 9:43am
9
Not sure if its actually a dot or any value not previously matched gets a nine. Escape or not to escape. That is the question. Use as needed.
Aia
February 20, 2018, 11:54am
10
geneanalyst:
Works great. How would you save the output to a text file, Output.txt instead of printing to screen since this is a huge array?
I think I figured it out by changing the last line in your code to:
print(*columns, sep=' ', file=open("output.txt", "a"))
Any problems with doing this?
The issue with that is that the file output.txt is opened on every row and appended to it.
Another way would be to use the shell redirection of stdout
python3 conv.py > output.txt
From the program itself it could be like:
convertion_table = { '0/0': 0, '0/1': 1, '1/1': 2, './.': 9 }
with open('columns_by_rows.file') as read_f:
with open('output.txt', 'w') as write_f:
for line in read_f:
columns = [convertion_table[f] for f in line.split()]
row = ' '.join([str(n) for n in columns]) + '\n'
write_f.write(row)
1 Like