grep variable

system · September 22, 2012, 1:12am

I've got a file that I'm trying to grep through that looks like this:

alpha1
alpha2
alpha3
beta1
beta2
gamma5
gamma6
gamma7
gamma8
gamma9

and I want the output to only contain the line with the highest value for each, so the output I want is:

alpha3
beta2
gamma9

I also need to make it understand that there may be new values like delta and epsilon etc added down the track and to just extract the highest numbered line.

how do I do this? really pulling my hair out, I've googled for hours. thanks so much!

elixir_sinari · September 22, 2012, 3:54am

awk '/^[[:alpha:]]+[[:digit:]]+/{
match($0,/[[:alpha:]]+/);w=substr($0,RSTART,RLENGTH)
match($0,/[[:digit:]]+/);n=substr($0,RSTART,RLENGTH)
if(n+0>=maxi[w]+0) {maxi[w]=n;rec[w]=$0}
}END{
for(i in rec) print rec
}' file

system · September 22, 2012, 4:11am

thanks for your help.

I'm quite new at this and the command I'm using is just a bash command that pipes the output to the next stage. i'm not sure what to do with the new lines since i'm just adding this to my existing command and entering it in terminal.

RudiC · September 22, 2012, 3:11pm

For using in a chain of pipes you can put the awk program (the stuff between the two ' chars but without them) into a file (say: "awkfile") and then use it like

cmd|cmd|awk -f awkfile|cmd ...

pamu · September 22, 2012, 4:10pm

sort file | awk '{if($0 !~ a){print k;max=0};{s=$0;a=$0;gsub("[a-z]","",s);gsub("[0-9]","",a);if(s > max){max=s;k=$0}}}END{print k}'

system · September 22, 2012, 5:30pm

Thank you so much guys that's looking close to what I need.

The only issue is that after making some changes to another section of the command, the list will originally look like this:

alpha1,1_1
alpha1,1_2
alpha1,1_3.2
alpha1,1_3.5
alpha1,2_2
alpha1,2_7
alpha2,9_1.8
alpha2,9_3
beta7,1_1
beta7,1_3.5

So I need the output to look like:

alpha1,1_3.5
alpha1,2_7
alpha2,9_3
beta7,1_3.5

Does that make sense?

elixir_sinari · September 22, 2012, 5:42pm

This will work for your sample input:

awk -F, '{split($2,a,"_");if(a[2]>=max[$1,a[1]]){max[$1,a[1]]=a[2];rec[$1,a[1]]=$0}}
END{for(i in rec) print rec|"sort"}' file

system · September 22, 2012, 5:49pm

Haha you are amazing that works like a charm.

The source file is about 30,000 lines and it churned through it in just a few seconds.

Kudos to you

pamu · September 22, 2012, 11:12pm

sort file | awk -F "_" '{if($0 !~ a){print k;max=0};{s=$2;a=$1;if(s > max){max=s;k=$0}}}END{print k}'

Scrutinizer · September 22, 2012, 11:34pm

sort -t_ -k1,1 -k2,2rn file | awk -F_ '!A[$1]++'

system · September 23, 2012, 7:18am

ok let's make things more complicated.

the source is now going to contain what is essentially a list of old database files that are stored in a hierarchical format, for example (yes this is keyboard bashing to indicate random miscellaneous words)

siubgsuafoua/orugaourga/aerga/alpha1,1_3*.dbcache
sfsudbgs/asfs/sfgsgsegrg/rgegsg/eegegaaregaeg/alpha1,1_4*.dbcache
ffunafrua/awfanfa/awefa/aaawefawf/beta2,5_9*.dbcache
rfserg/awerawer/awr/aeawrea/weraw/erawer/awer/beta2,5_1*.dbcache
sd/zszs/gamma8,6_7*.dbcache

so the output i would then need is:

sfsudbgs/asfs/sfgsgsegrg/rgegsg/eegegaaregaeg/alpha1,1_4*.dbcache
ffunafrua/awfanfa/awefa/aaawefawf/beta2,5_9*.dbcache
sd/zszs/gamma8,6_7*.dbcache

the commands you guys have posted above seem to get muddled because its trying to sort the lines without understanding they need to ignore everything before the final "/"

Scrutinizer · September 23, 2012, 9:42am

Hi tiberione,

It helps if the data sample is an accurate representation from the start... Data samples also need code tags around them so that they are easy to read...

Try:

sort -t_ -k1,1 -k2,2rn file | awk -F'[_*/]' '!A[$(NF-2)]++'