Help with appending random sequence to huge CDR file

shiburnair · April 7, 2014, 4:55pm

Hi,

Please help with a shell script to process all cdr's in a directory with above requirement.

Corona688 · April 7, 2014, 4:59pm

What have you tried?

bartus11 · April 7, 2014, 5:00pm

What should be the range of the random numbers?

shiburnair · April 7, 2014, 5:12pm

anyrange would be file, all that is need is it has to be unique across 3 million cdrs

Corona688 · April 7, 2014, 5:13pm

Moderator comments were removed during original forum migration.

shiburnair · April 7, 2014, 5:15pm

random number, I have figured as $date +%N, which can be used. I am not able to come out with the final working script.

---------- Post updated at 04:14 PM ---------- Previous update was at 04:13 PM ----------

this is what , I am trying now

cntLoop=$date +%N
INP_CSV_FILE="/data101/rating/cs5_upload/med_dir/postpaid/dupgprs/data/testcdr
OUT_CSV_FILE="/data101/rating/cs5_upload/med_dir/postpaid/dupgprs/data/outfile.csv"
pattern='|0| | | | | | | |'
rm -f $OUT_CSV_FILE
for line in `cat $INP_CSV_FILE`
do
        cntloop=$date +%N
        echo $line|0| | | | | | | |$cntloop| >> $OUT_CSV_FILE
done

---------- Post updated at 04:15 PM ---------- Previous update was at 04:14 PM ----------

I am getting below errors

./test.sh: line 1: +%N: command not found
./test.sh: line 9: syntax error near unexpected token `|'
./test.sh: line 9: `        echo "$line|0| | | | | | | |$cntloop| " >> $OUT_CSV_FILE'

Corona688 · April 7, 2014, 5:18pm

How about a number like 0000(number of file)0000(line number)? That's going to be unique. A truly random one runs the risk of being not.

shiburnair · April 7, 2014, 5:19pm

That would be perfect

Corona688 · April 7, 2014, 5:27pm

How does this work on one file:

awk 'FNR==1 {
        FNUM++
        if(LF) close(LF);
        LF=FILENAME".out"
}
{    printf("%s|0| | | | | | | |%08d%08d|\n", FNUM, FNR) > LF; }' input.cdr

shiburnair · April 7, 2014, 5:34pm

awk: cmd. line:5: { printf("%s|0| | | | | | | |%08d%08d|\n", FNUM, FNR);> LF }
awk: cmd. line:5: ^ syntax error

---------- Post updated at 04:34 PM ---------- Previous update was at 04:31 PM ----------

sample cdr format : i need to append to the end of each line like below mentioned extra columns and a random no followed by |. so as to make each cdr undoubtedly unique

0|1|0|20140406020532| |205| |5|0|620| |502| | |999933992| |3| | | |0|V:11:620:74043720:74043100|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0|0:0:0:0:0| |550|internet|502|0|0| |3333| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Corona688 · April 7, 2014, 5:37pm

There were some mistakes in my original try, how about this:

awk 'FNR==1 {
        FNUM++
        if(LF) close(LF);
        LF=FILENAME".out"
}
{    printf("%s|0| | | | | | | |%08d%08d|\n", $0, FNUM, FNR) > LF; }' input.cdr

shiburnair · April 7, 2014, 5:41pm

Looks perfect friend, Just please tell me, how to pick all files in a dir and do above appending. I am asking for this because, I have to deal with exact 45-50 files with 60k cdr's in each file.

Corona688 · April 7, 2014, 5:44pm

Do these files have anything in common with each other? What do they look like?

shiburnair · April 7, 2014, 5:45pm

After %s , space was needed as to provide a column width between last | and the first | we added. I made that change to the script you provided

---------- Post updated at 04:45 PM ---------- Previous update was at 04:44 PM ----------

In post#10, i provided a sample cdr, these files will have n number of lines with same cdr format

Corona688 · April 7, 2014, 5:47pm

Their names, I mean. How can I tell the files you want?

shiburnair · April 7, 2014, 5:51pm

For getting all files in folder , cant we use

cd /dupgprs/data/
for file in `ls -ltr |head -6000 |awk '{print $9}'`

It's not my brain, its some existing code for merging files. instead of 6000, if we use 1, will it work. I tried though, but the entire script doesnt give the needed result.

---------- Post updated at 04:51 PM ---------- Previous update was at 04:50 PM ----------

filename, I can put in like consolidated00[n] , where n is 1,2,3 etc. I have flexibility on file names

Corona688 · April 7, 2014, 5:53pm

Can we? That's the question. Do the folders contain only files you want changed? Or do they have anything you want left alone?

And why stop at 6000?

shiburnair · April 7, 2014, 5:54pm

the folder will only contain what i need to change. nothing else

Corona688 · April 7, 2014, 5:59pm

I wish you had showed me, rather than described, the changes you made... Now I have to guess where you put the space.

find /dupgprs/data/ -type f |
        grep -v "\.out" | # Ignore previously parsed files
        xargs awk 'BEGIN { getline FNUM < "/tmp/FNUM"; close("/tmp/FNUM"); }
END   { printf("%d\n", FNUM) > "/tmp/FNUM";     }
FNR==1{ FNUM++; }
{ printf("%s|0| | | | | | | |%08d%08d |\n", FNUM, FNR) > FILENAME".out"; }'

The BEGIN { } and END {} code are to load/store the file number in /tmp/FNUM, so it gets saved and loaded between different calls of awk. (of which there will likely be several, to accomodate several thousand files.)

Use nawk on solaris.

shiburnair · April 7, 2014, 6:03pm

let me try