CDR manupulation

Hello Friends,

I need to examine a huge CDR file according to a complex (at least for me) condition like below and i couldnt write anything :frowning:

In CDR file there are more than hundreds of fields, I need to print the rows which matches the below condition:

while $13 field of subsequent CDRs are equal to eachother and when $NF == 202 ||500 then check if $3 of the lines and decide which of those $3 fields are smaller or bigger and sort it from smaller to bigger, data:

F1 | F2 | 2011111902634918288 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5437776699 | F14 | ...... | 202

F1 | F2 | 2011111902634918289 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5447776644 | F14 | ...... | 501

F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500

Id like to have the output

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500
F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

because above two rows has same $3 (gsm no) and $NF of them are 202 and 500 in turn (it could be opposite too ; 500 and 202) so the smaller of 3rd fields comes first in output..

Thanks in advance
Best Regards

these two lines will be consecutive in the file? or anywhere in the file

I made a couple of assumptions:

1) only records that should be printed are the ones with matching field 13. If a record has no match it isn't printed.

2) matching records would be consecutive in the input.

awk -F \| '
    function print_set(     i )
    {
        if( iidx < 2)           # print only if there were more than 1
            return;

        asort( idx3 );
        for( i = 1; i <= length( idx3 ); i++ )
            print buf[idx3];
    }

    !( $NF == 202 || $NF == 500 ) { next; }

    p13 != $13 {                # not the same; print previous set if there
        print_set( );

        iidx = 0;               # reset cache
        delete idx;
        delete buf;
    }

    {                           # cache information untl we have a $13 that does not match
        p13 = $13;
        idx3[iidx++] = $3;
        buf[$3] = $0;
    }

    END {
        print_set( );
    }
' input-file >output-file

If all 202/500 records are to be printed, regardless of whether or not there was a matching record based on field 13, then comment out the first two lines (if and return) in the print_set function.

Not sure if this is easier/faster:

sort -t"|" -k13,3 infile|uniq -f24 -w11 -D|grep -E "500$|202$"

after sorting the infile on fields 13 and 3, uniq -D will kill lines with field 13 unique, resulting in what you cited as desired output:

F1 | F2 | 2011111902634918200 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 500
F1 | F2 | 2011111902634918255 | F4 | F5 | F6 | F7 | F8 | F9 | F10 | F11 | F12 | 5427776655 | F14 | ...... | 202

Another option maybe:

awk -F\| '/ 500$| 202$/{if($13 in A)print A[$13] RS $0; else A[$13]=$0}' infile | sort -t\| -k3,3n

--
@agama:

awk: calling undefined function asort
 input record number 7, file infile
 source line number 7

@RudiC

uniq: illegal option -- w
usage: uniq [-c | -d | -u] [-i] [-f fields] [-s chars] [input [output]]
1 Like

Sorry, should have stated uniq's version:

uniq (GNU coreutils) 8.13

-w indicates the # of chars to compare, i.e. sth like "field length"

Thank you all for your efforts,

Scrutinizer if the fields "500" and "202" wouldnt be consequtive in CDR files then would it be so diffucult to reach the same output? I guess it would be a long process to check 500 or 202 fields of same gsm_no in same file as there would be tousands of CDR rows in a file and the desired rows could come several hundred rows after the other one) corrcect?

Hi,

It should not matter where they are located in the file, as long as both records appear only once..

S.

Can always count on you for the absolute smallest programme!

Bloody asort(); It's what I get for assuming gawk. I'd post something with a simple sort in it, but it's still not as short and sweet, nor needed given your code, so I'll let it ride.

Thanks.

I'm a bit concerned with scrutinizer's sort - it would mess up adjacency of lines with identical field 13 - methinks it should sort -k13,13 first ...