How to format file into comma separated text file?

Hi Guys,

I have text file which is tab/space separated but I want it to re-format into a comma separated and trim the spaces in between. Can someone spare me a perl or sed script that can do the job?

INPUT FILE:

500010245623                500 21-APR-11 05.58.21 PM                                                                1
500010245637                500 21-APR-11 05.58.21 PM                                                                0
500010245644                500 21-APR-11 05.58.21 PM                                                                1
500010245645                500 21-APR-11 05.58.21 PM                                                                1
500010245654                500 21-APR-11 05.58.21 PM                                                                1
500010245662                500 21-APR-11 05.58.21 PM                                                                1
500010245664                500 21-APR-11 05.58.21 PM                                                                1
500010245666                500 21-APR-11 05.58.21 PM                                                                1
500010245667                500 21-APR-11 05.58.21 PM                                                                1
500010245677                500 21-APR-11 05.58.21 PM                                                                1
500010245679                500 21-APR-11 05.58.21 PM                                                                1

DESIRED OUTPUT:

500010245623,500,21-APR-11 05.58.21 PM,1
500010245637,500,21-APR-11 05.58.21 PM,0
500010245644,500,21-APR-11 05.58.21 PM,1
500010245645,500,21-APR-11 05.58.21 PM,1
500010245654,500,21-APR-11 05.58.21 PM,1
500010245662,500,21-APR-11 05.58.21 PM,1
500010245664,500,21-APR-11 05.58.21 PM,1
500010245666,500,21-APR-11 05.58.21 PM,1
500010245667,500,21-APR-11 05.58.21 PM,1
500010245677,500,21-APR-11 05.58.21 PM,1
500010245679,500,21-APR-11 05.58.21 PM,1

Thanks in advance.

Br,
Pinpe

 
perl -lane 'BEGIN{$,=","}print @F' input
sed -i 's/[ |\t]\+/,/g' input-file

---------- Post updated at 06:32 AM ---------- Previous update was at 06:28 AM ----------

But I see from your example that you want single spaces to not be replaced with commas, so you should use:

sed 's/ \{2,\}/,/g' input-file | sed 's/\t\+/,/g' > output-file

---------- Post updated at 06:33 AM ---------- Previous update was at 06:32 AM ----------

the output will be:

500010245623,500 21-APR-11 05.58.21 PM,1
500010245637,500 21-APR-11 05.58.21 PM,0
500010245644,500 21-APR-11 05.58.21 PM,1
500010245645,500 21-APR-11 05.58.21 PM,1
500010245654,500 21-APR-11 05.58.21 PM,1
500010245662,500 21-APR-11 05.58.21 PM,1
500010245664,500 21-APR-11 05.58.21 PM,1
500010245666,500 21-APR-11 05.58.21 PM,1
500010245667,500 21-APR-11 05.58.21 PM,1
500010245677,500 21-APR-11 05.58.21 PM,1
500010245679,500 21-APR-11 05.58.21 PM,1

Hi getmmg/ciupinet,

Thanks for your prompt response. I forgot something, I want to put some headers on the output file. Because I want them to be separated into 4, I need to have also 4 headers as shown below...

DESIRED OUTPUT:

Header1,Header2,Header3,Header4
500010245623,500,21-APR-11 05.58.21 PM,1
500010245637,500,21-APR-11 05.58.21 PM,0
500010245644,500,21-APR-11 05.58.21 PM,1
500010245645,500,21-APR-11 05.58.21 PM,1
500010245654,500,21-APR-11 05.58.21 PM,1
500010245662,500,21-APR-11 05.58.21 PM,1
500010245664,500,21-APR-11 05.58.21 PM,1
500010245666,500,21-APR-11 05.58.21 PM,1
500010245667,500,21-APR-11 05.58.21 PM,1
500010245677,500,21-APR-11 05.58.21 PM,1
500010245679,500,21-APR-11 05.58.21 PM,1

Thanks in advance.

Br,
Pinpe

 
perl -lane 'BEGIN{print "Header1,Header2,Header3,Header4"}print "$F[0],$F[1],$F[2] $F[3],$F[4]"' input

--ignore the post--

Hi getmmg,

This is the output of your perl script...

100116144521,100,15-APR-11,10.41.58,AM,1
100116144524,100,15-APR-11,10.41.58,AM,1
100116144549,100,15-APR-11,10.41.58,AM,1
100116145555,100,15-APR-11,10.41.58,AM,1
100116145559,100,15-APR-11,10.41.58,AM,1
100116145562,100,15-APR-11,10.41.58,AM,1
100116145563,100,15-APR-11,10.41.58,AM,1
100116145603,100,15-APR-11,10.41.58,AM,1
100116146597,100,15-APR-11,10.41.58,AM,1
100116146601,100,15-APR-11,10.41.58,AM,1
100116146604,100,15-APR-11,10.41.58,AM,1
100116146606,100,15-APR-11,10.41.58,AM,1
100116146609,100,15-APR-11,10.41.58,AM,1
100116146613,100,15-APR-11,10.41.58,AM,1
100116146615,100,15-APR-11,10.41.58,AM,1
100116146616,100,15-APR-11,10.41.58,AM,1
100116146617,100,15-APR-11,10.41.58,AM,1

If you will notice the date '21-APR-11 05.58.21 PM' become also in comma separated. I don't want the date to be in comma separated. I want to treat them as a whole field.

Can you treat the date as one field as shown below?

Header1,Header2,Header3,Header4
500010245623,500,21-APR-11 05.58.21 PM,1       
500010245637,500,21-APR-11 05.58.21 PM,0       
500010245644,500,21-APR-11 05.58.21 PM,1       
500010245645,500,21-APR-11 05.58.21 PM,1       
500010245654,500,21-APR-11 05.58.21 PM,1       
500010245662,500,21-APR-11 05.58.21 PM,1       
500010245664,500,21-APR-11 05.58.21 PM,1       
500010245666,500,21-APR-11 05.58.21 PM,1       
500010245667,500,21-APR-11 05.58.21 PM,1       
500010245677,500,21-APR-11 05.58.21 PM,1       
500010245679,500,21-APR-11 05.58.21 PM,1       

Thanks in advance mate!

Br,
Pinpe

I've corrected that in my next post. Try this one

 
perl -lane 'BEGIN{print "Header1,Header2,Header3,Header4"}print "$F[0],$F[1],$F[2] $F[3],$F[4]"' input

Hi getmmg,

Yes this is fine with me. Thanks man!.

Now, I want to divide all the 2nd fields by 100. So if my 2nd field is 500 and divide it to 100 it will become 5. Or I want to remove the 2 zeros from the 2nd field.

INPUT:

Header1,Header2,Header3,Header4
500010245623,500,21-APR-11 05.58.21 PM,1       
500010245637,500,21-APR-11 05.58.21 PM,0       
500010245644,500,21-APR-11 05.58.21 PM,1       
500010245645,500,21-APR-11 05.58.21 PM,1       
500010245654,500,21-APR-11 05.58.21 PM,1       
500010245662,500,21-APR-11 05.58.21 PM,1       
500010245664,500,21-APR-11 05.58.21 PM,1       
500010245666,500,21-APR-11 05.58.21 PM,1       
500010245667,500,21-APR-11 05.58.21 PM,1       
500010245677,500,21-APR-11 05.58.21 PM,1       
500010245679,500,21-APR-11 05.58.21 PM,1

OUTPUT:

Header1,Header2,Header3,Header4
500010245623,5,21-APR-11 05.58.21 PM,1       
500010245637,5,21-APR-11 05.58.21 PM,0       
500010245644,5,21-APR-11 05.58.21 PM,1       
500010245645,5,21-APR-11 05.58.21 PM,1       
500010245654,5,21-APR-11 05.58.21 PM,1       
500010245662,5,21-APR-11 05.58.21 PM,1       
500010245664,5,21-APR-11 05.58.21 PM,1       
500010245666,5,21-APR-11 05.58.21 PM,1       
500010245667,5,21-APR-11 05.58.21 PM,1       
500010245677,5,21-APR-11 05.58.21 PM,1       
500010245679,5,21-APR-11 05.58.21 PM,1

Thanks in advance mate!

Br,
Pinpe

Could you please post all your requirements at once an not change it every time someone provide you an answer ?

Thanks

1 Like

Sorry. This is my final requirement for this task. Thanks so much in advance.

Br,
Pete

$ cat tst
500010245623                500 21-APR-11 05.58.21 PM                                                                1
500010245637                500 21-APR-11 05.58.21 PM                                                                0
500010245644                500 21-APR-11 05.58.21 PM                                                                1
500010245645                500 21-APR-11 05.58.21 PM                                                                1
500010245654                500 21-APR-11 05.58.21 PM                                                                1
500010245662                500 21-APR-11 05.58.21 PM                                                                1
500010245664                500 21-APR-11 05.58.21 PM                                                                1
500010245666                500 21-APR-11 05.58.21 PM                                                                1
500010245667                500 21-APR-11 05.58.21 PM                                                                1
500010245677                500 21-APR-11 05.58.21 PM                                                                1
500010245679                500 21-APR-11 05.58.21 PM                                                                1
$ sed 's/ [     ][      ]*/,/g;s/00 /,/' tst
500010245623,5,21-APR-11 05.58.21 PM,1
500010245637,5,21-APR-11 05.58.21 PM,0
500010245644,5,21-APR-11 05.58.21 PM,1
500010245645,5,21-APR-11 05.58.21 PM,1
500010245654,5,21-APR-11 05.58.21 PM,1
500010245662,5,21-APR-11 05.58.21 PM,1
500010245664,5,21-APR-11 05.58.21 PM,1
500010245666,5,21-APR-11 05.58.21 PM,1
500010245667,5,21-APR-11 05.58.21 PM,1
500010245677,5,21-APR-11 05.58.21 PM,1
500010245679,5,21-APR-11 05.58.21 PM,1
$

Note that inside by bracket i have [<SPACE><TAB>]

1 Like

Pete,

Please post all your requirement at once. It would be easy that way.

perl -lane 'BEGIN{print "Header1,Header2,Header3,Header4"}print "$F[0],".$F[1]/100.",$F[2] $F[3],$F[4]"' input

Hi ctsgnb,

This is more elegant code/script of yours!! Thanks mate! But you forgot to put the headers. I have included in this request the headers as shown below...

Header1,Header2,Header3,Header4
500010245623,5,21-APR-11 05.58.21 PM,1
500010245637,5,21-APR-11 05.58.21 PM,0
500010245644,5,21-APR-11 05.58.21 PM,1
500010245645,5,21-APR-11 05.58.21 PM,1

500010245654,5,21-APR-11 05.58.21 PM,1
500010245662,5,21-APR-11 05.58.21 PM,1
500010245664,5,21-APR-11 05.58.21 PM,1

500010245666,5,21-APR-11 05.58.21 PM,1
500010245667,5,21-APR-11 05.58.21 PM,1
500010245677,5,21-APR-11 05.58.21 PM,1
500010245679,5,21-APR-11 05.58.21 PM,1

Appreciated so much your expertise. =)

Also if you can spare me one last request to remove the blank rows form the file. Thanks bro!

Br,
Pinpe

Regarding your Header matter : just use echo !!!!!

echo "Header1,Header2,Header3,Header4"
sed 's/ [     ][      ]*/,/g;s/00 /,/' tst

or

( echo "Header1,Header2,Header3,Header4" && sed 's/ [     ][      ]*/,/g;s/00 /,/' tst )>outputfile

or

echo "Header1,Header2,Header3,Header4" >output
sed 's/ [     ][      ]*/,/g;s/00 /,/' tst >>output

or

echo "Header1,Header2,Header3,Header4" | tee output
sed 's/ [     ][      ]*/,/g;s/00 /,/' tst | tee -a output