Awk to convert a text file to CSV file with some string manipulation

FreddyDaKing · August 17, 2012, 8:21am

Hi ,

I have a simple text file with contents as below:

12345678900    971,76    4234560890
22345678900   5971,72    5234560990
32345678900     71,12    6234560190

the new csv-file should be like:

Column1;Column2;Column3;Column4;Column5
123456;78900;971,76;423456;0890
223456;78900;5971,72;523456;0990
323456;78900;71,12;623456;0190

The requirements are:

Column1 contains first six characters of first number in text file
Column2 contains last five characters of first number in text file
Column3 contains complete semicolon separated number of text file
Column4 contains first six characters of last number in text file
Column4 contains last four characters of last number in text file

Unfortunately I'm some kind of newbie in awk. I've tried writing some scripts by myself, but not with to much success.

Any help&hints would be appreciated.

Thanks in advance.

elixir_sinari · August 17, 2012, 8:26am

awk '{sub(/.{6}/,"&"OFS,$1);sub(/.{6}/,"&"OFS,$3)}1' OFS=";" file

pamu · August 17, 2012, 8:30am

awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp

bartus11 · August 17, 2012, 8:30am

Does it have to be AWK?

perl -ple 's/\b\d{6}/$&;/g;s/\s+/;/g' file

FreddyDaKing · August 17, 2012, 9:09am

Thx for the quick answer. Works like a charm

---------- Post updated at 08:06 AM ---------- Previous update was at 07:34 AM ----------

Hi again,

probably a silly question but I also need the static first line in my csv file as a headline:

Column1;Column2;Column3;Column4;Column5
...

How can I combine this requirement using the script:

awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp

Thx a lot!

---------- Post updated at 08:09 AM ---------- Previous update was at 08:06 AM ----------

Hi again,

probably a silly question but I also need the static first line in my csv file as a headline:

Column1;Column2;Column3;Column4;Column5
...

How can I combine this requirement using the script:

awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp

Thx a lot!

complex.invoke · August 17, 2012, 9:17am

sed 's|\(.\{6\}\)\(.[^ ]*\) \{1,\}\(.[^ ]*\) \{1,\}\(.\{6\}\)\(.*\)|\1;\2;\3;\4;\5|g' infile

mjf · August 17, 2012, 10:16am

Using awk, try

cat input.file | awk ' BEGIN {print "Column1;Column2;Column3;Column4;Column5" } { print  substr($0,1,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,32,4) } '

vgersh99 · August 17, 2012, 10:22am

mjf:

Using awk, try

cat input.file | awk ' BEGIN {print "Column1;Column2;Column3;Column4;Column5" } { print  substr($0,1,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,32,4) } '

why exactly do you need 'cat'?

mjf · August 17, 2012, 12:02pm

Certainly do not need to pipe the contents of the input data file to awk using 'cat' and can input the data file in the awk command itself as in the prior examples, but can you tell me if there is any difference/advantage to using one method over the other (e.g. performance, amount of memory, # of recs that can be read, etc.)?

vgersh99 · August 17, 2012, 12:19pm

look at this discussion for starters.