gctex
April 25, 2011, 10:30am
1
I have a test file with the following 2 columns:
Col 1 | Col 2
T1 | 1 <= remove
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T4 | 1 <= remove
T1 | 2 <= remove
T3 | 2 <= remove
T3 | 1 <= remove
T2 | 1
I need to remove any sub branches ... eg., T4 in the left column appears above with a value of 2 in the right column. So remove any other occurences of T4 with lesser value in the right column. Similarly T1, 1 T1,2 need to be removed because there is T1,3. Data with higher value in Column 2 needs to be retained.
Expected final list:
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T2 | 1
awk -F"|" '$2 > a[$1]{a[$1]=$NF} END{for(i in a)print i FS a}' file
gctex
April 25, 2011, 10:23pm
3
Thanks, it works, but it prints this way :
T1 | 3
T2 | 1
T3 | 3
T4 | 2
T5 | 1
Can we print it without altering the original sort order?
Also, the first column, with values greater than 1 in the second column, need to be indented. ie., T4, T1 & T3.
(Original file had the indendations, but for some reason the indendation gets removed when the code is posted).
---------- Post updated at 09:23 PM ---------- Previous update was at 12:18 PM ----------
Frankin, thanks for adding code tags to my post. So can we print it the way I want it?
Try this,
awk -F"|" 'NR==FNR{if(a[$1]){ if(a[$1]<$2) {a[$1]=$2;b[$1]=NR}} else {a[$1]=$2;b[$1]=NR}}
NR>FNR{if(b[$1]==FNR){print}}' infile infile
##--get unique tags
for i in ` cat testfile.txt | awk '{print $1}'|sort -u`
do
grep $i testfile.txt >temp.txt
cat temp.txt | sort -n |tail -1 >>finaldata.txt
done
gctex
April 26, 2011, 10:59am
6
Not sure, this is what I am getting:
!. srt1.sh
T1 | 1
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T4 | 1
T1 | 2
T3 | 2
T3 | 1
T2 | 1
T1 | 1
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T4 | 1
T1 | 2
T3 | 2
T3 | 1
T2 | 1
!cat srt1.sh
awk -F"|" 'NR==FNR{if(a[$1]){ if(a[$1]<$2) {a[$1]=$2;b[$1]=NR}} else {a[$1]=$2;b[$1]=NR}}
NR>FNR{if(b[$1]==FNR){print}}' fp1.txt fp1.txt
---------- Post updated at 09:59 AM ---------- Previous update was at 09:56 AM ----------
This is what I am getting:
!. srt.sh
T1 | 1
T2 | 1
T3 | 1
T4 | 1
T5 | 1
!cat srt.sh
for i in `cat fp1.txt | awk '{print $1}'|sort -u`
do
grep $i fp1.txt >temp.txt
cat temp.txt | sort -n |tail -1 >>finaldata.txt
done
cat finaldata.txt
I got the desired output for the below.
script :
]$ cat test.sh
rm finaldata.txt
##--get unique tags
for i in ` cat tt.txt | awk '{print $1}'|sort -u`
do
grep $i tt.txt >temp.txt
cat temp.txt | sort -n |tail -1 >>finaldata.txt
done
cat finaldata.txt
have tried with this test file :
$ cat tt.txt
T1 | 1
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T4 | 1
T1 | 2
T3 | 2
T3 | 1
T2 | 1
T1 | 1
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T4 | 1
T1 | 2
T3 | 2
T3 | 1
T2 | 1
Got output :
$sh test.sh
T1 | 3
T2 | 1
T3 | 3
T4 | 2
T5 | 1
gctex
April 27, 2011, 7:50am
8
T1 is a sub branch of T4 just like T3. So it needs to appear along with T3. So final sort has to be this way:
T5 | 1
T4 | 2
T1 | 3
T3 | 3
T2 | 1