Modify CSV file

Hi, I would like to change my CSV file by adding " and : and moving some of the information around. the CSV file looks as follows:

501254424;500440257;PE PACKS;300467279;PREP;;276476070;655031001867176;Two Block;Olga;25/12/2015 00:00:00;Olga

I would like to move the field 7 to the front " and followed by the : and putting the rest of the CSV information in "". How can I do this with a shell script?

The desired output should look like this:

"276476070": "501254424;500440257;PE PACKS;300467279;PREP;;276476070;655031001867176;Two Block;Olga;25/12/2015 00:00:00;Olga"

Any attempts/ideas/thoughts from your side?

I tried the following:

awk 'BEGIN { FS=";"; OFS=";"; } {print $7,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13}' file

however this does not work and how to add the " and : I have not found a solution.

How about

awk -F";" -vDQ='"' '{print DQ $7 DQ ": " DQ $0 DQ}' file4
"276476070": "501254424;500440257;PE PACKS;300467279;PREP;;276476070;655031001867176;Two Block;Olga;25/12/2015 00:00:00;Olga"

Thank you very much that is amazing, it works.

When I echo the line and run the filter all is working as expected, however if I run the statement against a file, the last Double Quote is not present, any idea why?

Here is what I use:

 awk -F";" -vDQ='"' '{print DQ "264" $7 DQ ": " DQ $0 DQ}' /archive/ftp/SUBSDATA_001_20161118235001.txt

---------- Post updated at 09:53 PM ---------- Previous update was at 09:35 PM ----------

I have it now like this in the script:

#!/bin/bash
file="/archive/ftp/TNM_SUBSDATA_001_20161118235001.txt"
while IFS= read -r line
do
 awk -F";" -vDQ='"' '{print DQ "264" $7 DQ ": " DQ $0 DQ }' >> /tmp/output.txt
done <"$file"

However the last " behind the text is not present in the output file?

Hi,

When i try the command RudiC ( including your addition of 264 in latest post ) it works as expected.

cat file
501254424;500440257;PE PACKS;300467279;PREP;;276476070;655031001867176;Two Block;Olga;25/12/2015 00:00:00;Olga
501254424;500440257;PE PACKS;300467279;PREP;;27647xxxx;655031001867176;Two Block;Olga;25/12/2015 00:00:00;awk
awk -F";" -vDQ='"' '{print DQ "264" $7 DQ ": " DQ $0 DQ}' file > output 

Gives output as follows:

I notice that you have "ftp" in your path and my wild guess is that you are doing ftp from one OS to another and that is messing up your end-of-line characters, due to which you don't see the last character.

Do you observe the missing character after an ftp of "output.txt" from Unix/Linux to Windows?

First thing to check is that the "output.txt" is formed correctly in your Unix/Linux box. Run your script on a small data file (maybe 3 or 4 lines) and once "output.txt" is created, run the following command:

od -bc output.txt

You should see "\n" as the end-of-line character in Unix/Linux.
If you see "\r\n" as the end-of-line character, then that file is in DOS (or Windows) format.
If you are doing an ftp of the file, ensure that you do a binary ftp and not ascii. The ftp program usually takes care of end-of-line character unless you override it.

If the assumptions above are not true, then your problem might lie somewhere else.
You could start with showing us the output of "od -bc output.txt".
Also, what exactly do you do after the "output.txt" is created to find that the last character is missing?
Do you open it in vim/emacs/nano etc? Do you transfer it somewhere and then open it in some editor?

@durden_tyler thank you for the response. You are correct the file is being FTPed to the location however in ASCII. The challenge is that the FTP transfer cannot be changed as this is a 3rd party system, how can this be overcome? Here is the output of the:

od -bc output.txt
0000000 063 062 070 061 070 066 073 061 060 060 063 062 061 067 070 066
          3   2   8   1   8   6   ;   1   0   0   3   2   1   7   8   6
0000020 073 102 151 040 141 156 144 145 162 073 063 062 070 061 070 066
          ;   B   i       a   n   d   e   r   ;   3   2   8   1   8   6
0000040 073 120 117 123 124 073 102 151 040 141 156 144 145 162 073 062
          ;   P   O   S   T   ;   B   i       a   n   d   e   r   ;   2
0000060 067 066 063 065 064 066 071 071 073 066 065 065 060 063 061 060
          7   6   3   5   4   6   9   9   ;   6   5   5   0   3   1   0
0000100 060 061 070 060 061 071 067 070 073 101 143 164 151 166 145 073
          0   1   8   0   1   9   7   8   ;   A   c   t   i   v   e   ;
0000120 124 145 162 040 120 141 143 153 073 061 065 057 060 071 057 062
          T   e   r       P   a   c   k   ;   1   5   /   0   9   /   2
0000140 060 061 066 040 060 060 072 060 060 072 060 060 073 123 164 141
          0   1   6       0   0   :   0   0   :   0   0   ;   S   t   a
0000160 156 144 141 162 144 040 120 154 141 156 050 160 157 051 015 012
          n   d   a   r   d       P   l   a   n   (   p   o   )  \r  \n
0000200 063 062 070 061 071 060 073 062 060 060 070 067 061 061 067 062
          3   2   8   1   9   0   ;   2   0   0   8   7   1   1   7   2
0000220 073 112 141 040 157 156 141 073 063 062 070 061 071 060 073 120
          ;   J   a       o   n   a   ;   3   2   8   1   9   0   ;   P
0000240 117 123 124 073 112 141 040 157 156 141 073 062 067 065 064 066
          O   S   T   ;   J   a       o   n   a   ;   2   7   5   4   6
0000260 060 060 060 060 073 066 065 065 060 063 061 060 060 061 067 063
          0   0   0   0   ;   6   5   5   0   3   1   0   0   1   7   3
0000300 067 071 071 060 073 101 143 164 151 166 145 073 123 120 040 120
          7   9   9   0   ;   A   c   t   i   v   e   ;   S   P       P
0000320 154 165 163 040 114 151 073 063 061 057 060 065 057 062 060 061
          l   u   s       L   i   ;   3   1   /   0   5   /   2   0   1
0000340 066 040 060 060 072 060 060 072 060 060 073 123 164 141 156 144
          6       0   0   :   0   0   :   0   0   ;   S   t   a   n   d
0000360 141 162 144 040 120 154 141 156 050 160 157 051 015 012 063 062
          a   r   d       P   l   a   n   (   p   o   )  \r  \n   3   2
0000400 070 061 071 062 073 061 060 060 063 062 070 071 062 063 073 116
          8   1   9   2   ;   1   0   0   3   2   8   9   2   3   ;   N
0000420 040 110 157 154 144 151 156 147 040 050 120 124 131 051 114 124
              H   o   l   d   i   n   g       (   P   T   Y   )   L   T
0000440 104 073 063 062 070 061 071 062 073 120 117 123 124 073 116 040
          D   ;   3   2   8   1   9   2   ;   P   O   S   T   ;   N
0000460 110 157 154 144 151 156 147 040 050 120 124 131 051 114 124 104
          H   o   l   d   i   n   g       (   P   T   Y   )   L   T   D
0000500 073 062 067 065 065 060 067 066 070 060 073 066 065 065 060 063
          ;   2   7   5   5   0   7   6   8   0   ;   6   5   5   0   3
0000520 061 060 060 061 067 061 061 060 060 067 073 101 143 164 151 166
          1   0   0   1   7   1   1   0   0   7   ;   A   c   t   i   v
0000540 145 073 111 156 040 123 111 040 124 145 155 040 107 160 073 062
          e   ;   I   n       S   I       T   e   m       G   p   ;   2
0000560 071 057 060 066 057 062 060 061 062 040 060 060 072 060 060 072
          9   /   0   6   /   2   0   1   2       0   0   :   0   0   :
0000600 060 060 073 123 164 141 156 144 141 162 144 040 120 154 141 156
          0   0   ;   S   t   a   n   d   a   r   d       P   l   a   n
0000620 050 160 157 051 015 012 063 062 070 061 071 063 073 061 060 060
          (   p   o   )  \r  \n   3   2   8   1   9   3   ;   1   0   0
0000640 063 062 070 071 062 063 073 116 040 110 157 154 144 151 156 147
          3   2   8   9   2   3   ;   N       H   o   l   d   i   n   g
0000660 040 050 120 124 131 051 114 124 104 073 063 062 070 061 071 063
              (   P   T   Y   )   L   T   D   ;   3   2   8   1   9   3
0000700 073 120 117 123 124 073 116 040 110 157 154 144 151 156 147 040
          ;   P   O   S   T   ;   N       H   o   l   d   i   n   g
0000720 050 120 124 131 051 114 124 104 073 062 067 065 065 060 067 066
          (   P   T   Y   )   L   T   D   ;   2   7   5   5   0   7   6
0000740 070 061 073 066 065 065 060 063 061 060 060 061 067 061 061 060
          8   1   ;   6   5   5   0   3   1   0   0   1   7   1   1   0
0000760 060 070 073 101 143 164 151 166 145 073 111 156 040 123 111 040
          0   8   ;   A   c   t   i   v   e   ;   I   n       S   I
0001000 124 145 155 040 107 160 073 062 071 057 060 066 057 062 060 061
          T   e   m       G   p   ;   2   9   /   0   6   /   2   0   1
0001020 062 040 060 060 072 060 060 072 060 060 073 123 164 141 156 144
          2       0   0   :   0   0   :   0   0   ;   S   t   a   n   d
0001040 141 162 144 040 120 154 141 156 050 160 157 051 015 012 063 062
          a   r   d       P   l   a   n   (   p   o   )  \r  \n   3   2
0001060 070 061 071 064 073 061 060 060 063 062 070 071 062 063 073 116
          8   1   9   4   ;   1   0   0   3   2   8   9   2   3   ;   N
0001100 040 110 157 154 144 151 156 147 040 050 120 124 131 051 114 124
              H   o   l   d   i   n   g       (   P   T   Y   )   L   T
0001120 104 073 063 062 070 061 071 064 073 120 117 123 124 073 116 040
          D   ;   3   2   8   1   9   4   ;   P   O   S   T   ;   N
0001140 110 157 154 144 151 156 147 040 050 120 124 131 051 114 124 104
          H   o   l   d   i   n   g       (   P   T   Y   )   L   T   D
0001160 073 062 067 065 065 060 067 066 070 062 073 066 065 065 060 063
          ;   2   7   5   5   0   7   6   8   2   ;   6   5   5   0   3
0001200 061 060 060 061 067 061 061 060 060 071 073 101 143 164 151 166
          1   0   0   1   7   1   1   0   0   9   ;   A   c   t   i   v
0001220 145 073 111 156 040 123 111 040 124 145 155 040 107 160 073 062
          e   ;   I   n       S   I       T   e   m       G   p   ;   2
0001240 071 057 060 066 057 062 060 061 062 040 060 060 072 060 060 072
          9   /   0   6   /   2   0   1   2       0   0   :   0   0   :
0001260 060 060 073 123 164 141 156 144 141 162 144 040 120 154 141 156
          0   0   ;   S   t   a   n   d   a   r   d       P   l   a   n
0001300 050 160 157 051 015 012
          (   p   o   )  \r  \n
0001306

This looks like your "subsdata" file.
The "output" file is supposed to have a number and a colon character ( : ) at the beginning.

(1) Does your "subsdata" file have "\r\n" end-of-line characters to begin with?
(2) Are you able to use RudiC's awk script to transform your "subsdata" file and write new content to "output.txt" file?
(3) Does your "output" file have "\r\n" end-of-line characters as well?
(4) The ftp transfers your "output" file from a source OS to a target OS. Is your source OS Unix/Linux and the target OS Windows?
(4) Do you notice the missing last character in your source OS or target OS or both?

Change RudiC's suggestion from:

awk -F";" -vDQ='"' '{print DQ $7 DQ ": " DQ $0 DQ}' file

to:

awk -F";" -vDQ='"' '{sub(/\r$/,"");print DQ $7 DQ ": " DQ $0 DQ}' file

to get rid of the <carriage-return> at the end of each line in your input file instead of copying it to your output before the closing <double-quote> character.

Thank you for the suggestion, I am currently getting an error:

awk -F";" -vDQ='"' '{sub(/\r$/,""};print DQ $7 DQ ": " DQ $0 DQ}' output.txt
Error:
awk: cmd. line:1: {sub(/\r$/,""};print DQ $7 DQ ": " DQ $0 DQ}
awk: cmd. line:1:              ^ syntax error

Sorry,
That should have been:

awk -F";" -vDQ='"' '{sub(/\r$/,"");print DQ $7 DQ ": " DQ $0 DQ}' file

Got it thank you very much Don Cragun it is working now:

awk -F";" -vDQ='"' '{sub (/\r$/,"");print DQ $7 DQ ": " DQ $0 DQ}' output.txt

---------- Post updated at 12:54 PM ---------- Previous update was at 12:08 PM ----------

Hi, thank you very much. Have now run the awk filter against the entire file and found as there is free text fields some of them contain information that is not working or causing problems with post processing. How can I remove the following before running the awk statement? Here are the characters I would like to remove from the file before formatting the file:

"
/
\

---------- Post updated at 02:14 PM ---------- Previous update was at 12:54 PM ----------

Got it, thank you very much for the assistance, it is truly appreciated:

awk -F";" -vDQ='"' '{sub(/\r$/,"");gsub("\"","");gsub("/","");gsub(/\\/,"");print DQ $7 DQ ": " DQ $0 DQ}' output.txt

Or

awk -F";" -vDQ='"' '{gsub (/["\/\\]|\r$/,""); print DQ $7 DQ ": " DQ $0 DQ }' file