Your code does work. thanks so much!
But i have another issue. If i want to get output on a specified columns, how should i do it? if i set the NF==2, i will get the result for any 2 columns that have "M" but i would not know which columns they are. unless i print the whole thing and check which columns. But, this going to make me check all of them and i have huge files to work on. Therefore, i need to set the specific columns. Let say, i want to get ID which $2 and $3 have "M". How should i do it? thanks
i tried your above codes, but the result is wrong. if possible, can you please explain to me the coding?
also, just for clarification, for the example that i gave earlier ($2 and $3), it means that i want to extract ID in condition where both columns has "M" at the same time. i might need to extract ID where $2 and $5 have "M" at the same time too. thanks
Your code does work for the first condition that i mentioned above. But if i choose $2 and $3 to have "M" at the same time, it didnt work. ok, i will give another example here:-
ID Pat1 Pat2 Pro1 use1
add11 M M
add17 M M M
add32 M M
add66 M M
add55 M M M
add47 M M
i guess it is the problem with FS. Below is cat results:-
ID use1 P00dfd|Pat_2 P00dddf|PRO1 P00ddff|Pat1 $
Add7G00710 M $
Add6G11920 M M $
Add6G00630 M $
Add5G10140 M $
Add4G00470 M $
Add3G02060 M $
Add2G17360 M M $
Add2G12550 M $
Add2G05350 M M $
Add6G00710 M $
Add5G01350 M $
Add4G14230 M $
Add4G06080 M M M M $
Add4G03920 M $
Add3G11760 M $
Add3G02890 M $
Add1G11830 M M M $
Add2G12710 M $
Add5G13290 M M $
Actually my data was in vertical. I used an awk script to make it horizontal so that it is easier for me to see where "M" is. But, when i do column -t to clean it, the alignment goes haywire. I am still trying to figure out how to deal with this. If you have suggestion, pls let me know. Thanks
I don't know what you mean by vertical vs horizontal, but what you quoted is not TAB separated fields, but rather multi-space separated fields.
Pls post the original file before any transformations.
Add7G00710 213 235 P00dddf|PRO1 M
Add6G11920 65 511 use1 M
Add6G11920 133 149 P00dddf|PRO1 M
Add6G00630 78 510 P00dddf|PRO1 M
Add5G10140 93 539 P00dddf|PRO1 M
Add4G00470 10 471 P00dddf|PRO1 M
Add4G00470 109 351 P00dddf|PRO1 M
Add3G02060 1 299 P00dddf|PRO1 M
Add2G17360 1 116 P00dfd|Pat_2 M
Add2G17360 27 478 P00dddf|PRO1 M
Add2G12550 5 22 P00dddf|PRO1 M
Add2G05350 1 450 use1 M
Add2G05350 88 624 P00dfd|Pat_2 M
Add6G00710 11 447 P00dddf|PRO1 M
Add5G01350 149 164 P00dddf|PRO1 M
Add4G14230 11 447 P00dddf|PRO1 M
Add4G06080 132 157 use1 M
Add4G06080 113 514 P00dfd|Pat_2 M
Add4G06080 24 481 P00dddf|PRO1 M
Add4G06080 320 337 P00ddff|Pat1 M
Add4G03920 75 507 P00dddf|PRO1 M
Add3G11760 27 530 P00dddf|PRO1 M
Add3G02890 19 473 P00dddf|PRO1 M
Add1G11830 36 472 P00dddf|PRO1 M
Add1G11830 66 561 P00ddff|Pat1 M
Add1G11830 27 536 P00ddfd|Pat_2 M
Add2G12710 23 543 P00dddf|PRO1 M
Add5G13290 5 22 use1 M
Add5G13290 37 54 P00dfd|Pat_2 M
There are duplicates as they are different in $2 and $3. But, whichever duplicates, i just take one. for example, Add4G00470 in $1 has appeared twice due to value in $2 and $3. Since they are in the same group (in $4), i take it as 1 only to my output. Thanks.
Please note that your latest sample differs considerably from your samples in posts #1, #7 and #11 and these also differ from one another. Is this the correct sample this time?
Sorry for the confusion. The first sample (#1) is just a toy example. When things didnt work out, then i gave the real sample post (#7 and #11). However, since there is issue with field separator, vgersh99 asked for the original file. Therefore, the latest data is the original data before being transformed into #7 or #11 as per request. Thanks
Hi,
From what I understand, you want to print first column whenever there are exactly two "M"s in 2nd, 3rd, 4th and 5th columns combined. If that is the case (and file is tab delimited), I would create a korn shell script as below and name it as script.ksh :
#!/bin/ksh
infile=$1
nawk -F" " <${infile} \
'BEGIN{cnt=0} \
NR>=2{hdr=$1; if ( $2 == "M" ) cnt++;
if ( $3 == "M" ) cnt++;
if ( $4 == "M" ) cnt++;
if ( $5 == "M" ) cnt++;
if ( cnt == 2 ) print hdr; cnt=0}'
Then I execute this script as below:
script.ksh infile
where infile is the tab delimited file of interest.
Hope this helps
thanks so much for the codes. I have been thinking about the FS issue. I have decided to work on the original file post #13 (before transformation) to get what i wanted instead of input file in post #11. It is much easier i guess as the original file is tab delimeted. Thanks