Awk to convert a text file to CSV file with some string manipulation

Hi ,

I have a simple text file with contents as below:

12345678900    971,76    4234560890
22345678900   5971,72    5234560990
32345678900     71,12    6234560190

the new csv-file should be like:

Column1;Column2;Column3;Column4;Column5
123456;78900;971,76;423456;0890
223456;78900;5971,72;523456;0990
323456;78900;71,12;623456;0190

The requirements are:

  • Column1 contains first six characters of first number in text file
  • Column2 contains last five characters of first number in text file
  • Column3 contains complete semicolon separated number of text file
  • Column4 contains first six characters of last number in text file
  • Column4 contains last four characters of last number in text file

Unfortunately I'm some kind of newbie in awk. I've tried writing some scripts by myself, but not with to much success.

Any help&hints would be appreciated.

Thanks in advance.

awk '{sub(/.{6}/,"&"OFS,$1);sub(/.{6}/,"&"OFS,$3)}1' OFS=";" file
1 Like
awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp
1 Like

Does it have to be AWK?

perl -ple 's/\b\d{6}/$&;/g;s/\s+/;/g' file
1 Like

Thx for the quick answer. Works like a charm :slight_smile:

---------- Post updated at 08:06 AM ---------- Previous update was at 07:34 AM ----------

Hi again,

probably a silly question but I also need the static first line in my csv file as a headline:

Column1;Column2;Column3;Column4;Column5
...

How can I combine this requirement using the script:

awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp

Thx a lot!

---------- Post updated at 08:09 AM ---------- Previous update was at 08:06 AM ----------

Hi again,

probably a silly question but I also need the static first line in my csv file as a headline:

Column1;Column2;Column3;Column4;Column5
...

How can I combine this requirement using the script:

awk '{ print substr($1, 1, 6), substr($1, 7, 11) , $2 , substr($3, 1, 6), substr($3, 7, 11) }' OFS=\; test_temp

Thx a lot!

sed 's|\(.\{6\}\)\(.[^ ]*\) \{1,\}\(.[^ ]*\) \{1,\}\(.\{6\}\)\(.*\)|\1;\2;\3;\4;\5|g' infile

Using awk, try

cat input.file | awk ' BEGIN {print "Column1;Column2;Column3;Column4;Column5" } { print  substr($0,1,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,7,5) ";" $2 ";" substr($0,26,6) ";" substr($0,32,4) } '
1 Like

why exactly do you need 'cat'?

Certainly do not need to pipe the contents of the input data file to awk using 'cat' and can input the data file in the awk command itself as in the prior examples, but can you tell me if there is any difference/advantage to using one method over the other (e.g. performance, amount of memory, # of recs that can be read, etc.)?

look at this discussion for starters.