CDR manupulation

EAGL · July 27, 2012, 7:41am

Hello Friends,

I need to examine a huge CDR file according to a complex (at least for me) condition like below and i couldnt write anything

In CDR file there are more than hundreds of fields, I need to print the rows which matches the below condition:

while $13 field of subsequent CDRs are equal to eachother and when $NF == 202 ||500 then check if $3 of the lines and decide which of those $3 fields are smaller or bigger and sort it from smaller to bigger, data:

F1 | F2 | 2011111902634918288 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5437776699 | F14 | ...... | 202

F1 | F2 | 2011111902634918289 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5447776644 | F14 | ...... | 501

F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500

Id like to have the output

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500
F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

because above two rows has same $3 (gsm no) and $NF of them are 202 and 500 in turn (it could be opposite too ; 500 and 202) so the smaller of 3rd fields comes first in output..

Thanks in advance
Best Regards

raj_saini20 · July 28, 2012, 8:21am

these two lines will be consecutive in the file? or anywhere in the file

agama · July 28, 2012, 12:01pm

I made a couple of assumptions:

1) only records that should be printed are the ones with matching field 13. If a record has no match it isn't printed.

2) matching records would be consecutive in the input.

awk -F \| '
    function print_set(     i )
    {
        if( iidx < 2)           # print only if there were more than 1
            return;

        asort( idx3 );
        for( i = 1; i <= length( idx3 ); i++ )
            print buf[idx3];
    }

    !( $NF == 202 || $NF == 500 ) { next; }

    p13 != $13 {                # not the same; print previous set if there
        print_set( );

        iidx = 0;               # reset cache
        delete idx;
        delete buf;
    }

    {                           # cache information untl we have a $13 that does not match
        p13 = $13;
        idx3[iidx++] = $3;
        buf[$3] = $0;
    }

    END {
        print_set( );
    }
' input-file >output-file

If all 202/500 records are to be printed, regardless of whether or not there was a matching record based on field 13, then comment out the first two lines (if and return) in the print_set function.

RudiC · July 28, 2012, 12:48pm

Not sure if this is easier/faster:

sort -t"|" -k13,3 infile|uniq -f24 -w11 -D|grep -E "500$|202$"

after sorting the infile on fields 13 and 3, uniq -D will kill lines with field 13 unique, resulting in what you cited as desired output:

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500
F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

Scrutinizer · July 28, 2012, 4:31pm

Another option maybe:

awk -F\| '/ 500$| 202$/{if($13 in A)print A[$13] RS $0; else A[$13]=$0}' infile | sort -t\| -k3,3n

--
@agama:

awk: calling undefined function asort
 input record number 7, file infile
 source line number 7

@RudiC

uniq: illegal option -- w
usage: uniq [-c | -d | -u] [-i] [-f fields] [-s chars] [input [output]]

RudiC · July 29, 2012, 5:03am

Sorry, should have stated uniq's version:

uniq (GNU coreutils) 8.13

-w indicates the # of chars to compare, i.e. sth like "field length"

EAGL · July 30, 2012, 4:07am

scrutinizer:

Another option maybe:

awk -F\| '/ 500$| 202$/{if($13 in A)print A[$13] RS $0; else A[$13]=$0}' infile | sort -t\| -k3,3n

--
@agama:

awk: calling undefined function asort
 input record number 7, file infile
 source line number 7

@RudiC

uniq: illegal option -- w
usage: uniq [-c | -d | -u] [-i] [-f fields] [-s chars] [input [output]]

Thank you all for your efforts,

Scrutinizer if the fields "500" and "202" wouldnt be consequtive in CDR files then would it be so diffucult to reach the same output? I guess it would be a long process to check 500 or 202 fields of same gsm_no in same file as there would be tousands of CDR rows in a file and the desired rows could come several hundred rows after the other one) corrcect?

Scrutinizer · July 30, 2012, 4:44am

Hi,

It should not matter where they are located in the file, as long as both records appear only once..

S.

agama · July 30, 2012, 8:56pm

scrutinizer:

Another option maybe:

awk -F\| '/ 500$| 202$/{if($13 in A)print A[$13] RS $0; else A[$13]=$0}' infile | sort -t\| -k3,3n

--
@agama:

awk: calling undefined function asort
 input record number 7, file infile
 source line number 7

Can always count on you for the absolute smallest programme!

Bloody asort(); It's what I get for assuming gawk. I'd post something with a simple sort in it, but it's still not as short and sweet, nor needed given your code, so I'll let it ride.

Thanks.

RudiC · July 31, 2012, 2:51am

I'm a bit concerned with scrutinizer's sort - it would mess up adjacency of lines with identical field 13 - methinks it should sort -k13,13 first ...