How to pick a group of data using awk/ksh

calredd · April 20, 2012, 12:37pm

Hi gurus,
I have data coming in as shown below. And in each case, I need to pick the data in the last group as shown below.

Data Set 1:

DC | 18161621
LA | 15730880
NY | 16143237
DC | 18161621
LA | 17316397
NY | 17915905
DC | 18161621
LA | 17993534
NY | 18161621
DC | 18161621
LA | 17993534
NY | 18161621

in this above case (a group is defined as the bottom most unique rows), I need to pick the last group of data which is :

DC | 18161621
LA | 17993534
NY | 18161621

Data Set 2:

DC | 18161621
LA | 15730880
NY | 16143237
SF | 10000000
DC | 18161621
LA | 17316397
NY | 17915905
SF | 10000011
DC | 18161621
LA | 17993534
NY | 18161621
SF | 10000111
DC | 18161621
LA | 17993534
NY | 18161621
SF | 10009090

in this above case (a group is defined as the bottom most unique rows), I need to pick the last group of data which is :

DC | 18161621
LA | 17993534
NY | 18161621
SF | 10009090

So, the number of lines/records in a group can be just one or two or three or more but always I need to pick the last group and echo it to a file.

I was hoping to achieve it with an awk but no go so far. Any help into this is appreciated. Let me know if I am not clear.
Thanks,
Carl.

birei · April 20, 2012, 2:16pm

Hi calredd,

It can be an easy task or a bit more difficult. Can you identify each group? I mean, does the group always begins in DC or can vary?

calredd · April 20, 2012, 2:38pm

The names in the group can be anything, not necessarily start with a DC or NY. But the data will always come in groups :

In this above case, I need to pick the last group:

FIRST_NUMBER|300
SECOND_NUMBER|399

Let me know if you need more clarification.

---------- Post updated at 02:38 PM ---------- Previous update was at 02:31 PM ----------

FYI... one solution that I am trying to implement is to first determine number of unique lines/rows based on the first field only (as pipe is always a delimiter) :

In the above case, that would be 2.

then just do a tail -2 of the above data would hopefully give me the last two lines but my solution isnt working yet.

47shailesh · April 20, 2012, 2:47pm

one way can be:

tail -n  $(sort -u -t"|" -k1,1 infile | wc -l) infile

calredd · April 20, 2012, 3:19pm

I get your point but the syntax isnt working. How do I pass a negative number via variable to tail commad...

tail -2 infile works fine but
tail -${n} infile doesnt work assuming n=2

Corona688 · April 20, 2012, 3:22pm

Certainly it does, if n=2. If it doesn't work, I suspect you've got spaces or something as well as that digit inside n, messing up your statement.

47shailesh · April 20, 2012, 3:24pm

which part is not working. can you post error here.

tail -2 is equivalent to tail -n 2 so try tail -n ${n} infile and tail -${n} works for me

Scrutinizer · April 20, 2012, 3:27pm

Try:

awk -F \| 'NR==1{s=$1} s==$1{p=$0;next} {p=p RS $0} END{print p}' infile

On Solaris use /usr/xpg4/bin/awk instead of awk

DC | 18161621
LA | 17993534
NY | 18161621

DC | 18161621
LA | 17993534
NY | 18161621
SF | 10009090

FIRST_NUMBER|300
SECOND_NUMBER|399

calredd · April 20, 2012, 3:46pm

I use sun os and may be thats why its not working..

>tail -n 2 infile
usage: tail [+/-[n][lbc][f]] [file]
       tail [+/-[n][l][r|f]] [file]

>tail -2 infile
FIRST_NUMBER|300
SECOND_NUMBER|399

Corona688 · April 20, 2012, 3:55pm

No reason I know of that tail -$n shouldn't work in sunos. That's processed before tail is run, so it really shouldn't notice any difference. Again, check the contents of $n to make sure there's no spaces or other junk -- that could turn it from tail -2 into tail - 2, which isn't the same...

47shailesh · April 20, 2012, 4:00pm

I agree with Corona, and for Solaris use this instead

tail -$(sort -u -t"|" -k1,1 infile | wc -l) infile

or

/usr/xpg4/bin/tail -n $(sort -u -t"|" -k1,1 infile | wc -l) infile

I would recommend Scrutinizer's code, it's better as it uses only one external program, mine uses 3

calredd · April 20, 2012, 4:29pm

Thank you all for multiple solutions. I will try to use the awk method.