Remove comma and next rows beginning from the end

EAGL · November 4, 2010, 5:57pm

Hello friends,

I have a file which consists of many rows, I use a couple of commands to convert it so i can use in a database query for filtering. I need the first columns (msisdns) in a row, seperated with commas,

9855162267,4,5,2010-11-03 17:02:07.627
9594567938f,5,5,2010-11-02 12:47:08.047
9855155486,4,5,2010-11-01 12:26:37.640
9233453445f,5,5,2010-11-02 11:20:43.327
9434326423,5,5,2010-11-01 11:02:02.217
9592416210f,4,5,2010-11-02 10:20:52.063

nawk -F, '{print $1}' FILE | sed -e 's/f$//g' -e 's/\([0-9]\{10\}\)/91\1/g' | nawk '!_[$0]++' | nawk -v RS="\n" -v ORS=","  '{}'1

code works well but in output there is a comma at the end and more, so i cant save it to a file. I know it is easier to get rid of it using "print" options but i couldnt, I appreciate any suggestion to remove the colored part, is it also possible without adding another command with pipe?

919855162267,919594567938,919855155486,919233453445,919434326423,919592416210,server{root}/a/b/c>

Regards

Chubler_XL · November 4, 2010, 8:16pm

This should do it in 1 nawk command:

nawk -F, '{ printf (NR==1?"":",")$1} END {printf "\n"}' FILE

Scrutinizer · November 5, 2010, 3:29am

To drop the f force $1 into numerical context:

nawk -F, '{printf (NR>1?FS:x)91$1+0} END{print x}' infile

rdcwayx · November 5, 2010, 4:45am

awk -F, '{gsub(/f/,"",$1)}NR==1{a="91" $1;b[$1]++;next}!b[$1]++ {a=a ",91" $1}END{print a}' FILE

ctsgnb · November 5, 2010, 5:07am

@Scruti

with the +0 then i get a result in scientific notation :

# awk -F, '{printf (NR>1?FS:x)91$1+0} END{print x}' infile
919.85516e+09,919.59457e+09,919.85516e+09,919.23345e+09,919.43433e+09,919.59242e+09

Scrutinizer · November 5, 2010, 5:14am

That is probably because your awk cannot handle large integers. Try:

nawk -F, '{printf (NR>1?FS:x)"91%.0f",$1} END{print x}' infile

(which is better practice anyway . Another advantage is there is no need for +0)

ctsgnb · November 5, 2010, 5:39am

Yep ! that one is better

x stand for an empty string , correct ?

Scrutinizer · November 5, 2010, 5:51am

Correct, since variable x is uninitialized and it is used in a string context.

EAGL · November 5, 2010, 6:28am

Thanks all for your responses, I have a question;

rdcwayx your code simplly covers all mine , it is good that you put "!b[$1]++ " option to remove duplicates there. When one of first field of next lines is same then it skips directly to the next line without applying the rest of code does not it?

Besides Scrutinizer it is good to learn how "x" works there

And Scrutinezer i added the

!b[$1]++

part to your code and it also covers removing duplidates now

nawk -F, '!b[$1]++{printf (NR>1?FS:x)91$1+0} END{print x}' FILE

Regards

Scrutinizer · November 7, 2010, 2:03am

Hi Eagle, It did not know that was a requirement. But then I would add this (to exclude duplicates that differ only in an "f"):

awk -F, '!b[$1+0]++{printf (NR>1?FS:x)91$1+0} END{print x}'  infile