Setting the number of fields using awk

Hi Guys,

I've obviously had a senior moment here, what I'm trying to do is set the number of fields to 35 in a csv these should be appended to the end of the line. But what I'm getting is:-

Source Data

[davem@deneb data]$ head out_file_01.txt
N1000,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,1T,5,2,V,50003414,,20090602,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,14,50003414,,20091118,,,,,,20110930,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,20110930,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,25,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,V,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,01032549,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,41,5,2,V,50003414,,20111201,,,,,,20140430,01032549,

What I tried was a nice simple awk command - I'm not great at awk but this was the attempt.

[davem@deneb data]$ head out_file_01.txt | awk -F"," '{NF=35}1' OFS=","
,,,,,,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,1T,5,2,V,50003414,,20090602,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,14,50003414,,20091118,,,,,,20110930,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,20110930,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,25,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,V,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,01032549,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,41,5,2,V,50003414,,20111201,,,,,,20140430,01032549,
[davem@deneb data]$ 

I know that this is my stupidity - but I'm unsure how I've got it so wrong in such a simple statement.

Regards

Gull04

Hello gull04,

Could you please try following and let me know if this helps you.

awk -F, '{for(i=1;i<=35;i++){printf("%s%s",$i,i==35?"":",")};print ""}'    Input_file

Thanks,
R. Singh

Hi Ravinder,

What we get now is;

[davem@deneb data]$ awk -F, '{for(i=1;i<=35;i++){printf("%s%s",$i,i==35?"":",")};print ""}' out_file_01.txt | head -10
,,,,,,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,1T,5,2,V,50003414,,20090602,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,14,50003414,,20091118,,,,,,20110930,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,20110930,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,25,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,V,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,01032549,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,41,5,2,V,50003414,,20111201,,,,,,20140430,01032549,
[davem@deneb data]$ 

I have done this before, it was a very simple awk command - just the old grey matter doesn't seem to have the same recall any more.

Regards

Gull04

Hello gull04,

IMHO if I understood your question correctly then you required to have 35 fields in each line? If yes then I think above code is doing the trick, see following.

awk -F, '{for(i=1;i<=35;i++){printf("%s%s",$i,i==35?"":",")};print ""}'   Input_file  > Input_file_temp

awk -F, '{print NF}'  Input_file_temp
35
35
35
35
35
35
35
35
35
35

Kindly do let me know if any queries, will try my best to help and learn.

Thanks,
R. Singh

Hi Ravinder,

You are absolutely right the code is generating 35 fields - by removing data at the begining of the line and replacing each character removed with the separator. What I would actually like is the data to remain intact and the additional commas to be added to the end of the line.

So the first line;

N1000,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,

Would become;

N1000,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,,,,,,,

With the new fields at the end of the line, instead of at the start of the line;

,,,,,,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,

Regards

Gull04

Hello gull04,

Could you please try following and let me know if this helps you.

awk -F, '{q=35-NF;val=substr($0,q+1);if(q){while(i<q){val=","val;i++}};i="";print val}'   Input_file

Thanks,
R. Singh

Hi Ravinder,

No change I'm afraid;

[davem@deneb data]$ awk -F, '{q=35-NF;val=substr($0,q+1);if(q){while(i<q){val=","val;i++}};i="";print val}' out_file_01.txt | head -10
,,,,,,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,1T,5,2,V,50003414,,20090602,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,14,50003414,,20091118,,,,,,20110930,
,,,,,,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,20110930,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,25,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,V,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
,,,,0,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,01032549,
,,,00,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,41,5,2,V,50003414,,20111201,,,,,,20140430,01032549,

Regards

Gull04

I bet you have a DOS line terminator (<CR>, ^M, \r, 0x0D) in there ...

Hi Rudi,

You're almost correct, hidden away in the file was the following;

[davem@deneb data]$ dos2unix < out_file_01.txt > out_file_02.txt
dos2unix: Binary symbol 0x00 found at line 5560351
dos2unix: Skipping binary file stdin
[davem@deneb data]$ vi out_file_01.txt
[davem@deneb data]$ dos2unix < out_file_01.txt > out_file_02.txt
[davem@deneb data]$ head out_file_02.txt | awk -F"," '{NF=35}1' OFS=","
N1000,024,2809003,,,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,,5,2,V,50003414,,,,,,,,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,1T,5,2,V,50003414,,20090602,,,,,,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,14,50003414,,20091118,,,,,,20110930,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20090520,20090520,32,5,2,V,50003414,,20091118,,,,,,20110930,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,25,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,1T,5,2,V,50003414,,20130711,,,,,,20140430,01032549,,,A0070C,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,22812-0007,1,165,US,20110411,20090520,32,5,2,V,50003414,,20091118,,,,,,20140430,01032549,,,,
N1000,024,2809003,52215,1985,3,DYNAMIC AVLEASE INC,PO BOX 7,,BRIDGEWATER,VA,228120007,1,165,US,20110411,20090520,41,5,2,V,50003414,,20111201,,,,,,20140430,01032549,,,,
[davem@deneb data]$ 

Such is life, I was sure that I'd already ran dos2unix on it - but it would seem not - I hate the way that vi now suppresses the control characters.

Anyway many thanks to everyone for the assistance.

Regards

Gull04

In most versions of vi , even if it doesn't usually show the <carriage-return> characters, you might notice that when you first open a file for editing, there is a note in the status line just after you open the file similar to the following:

"file" [dos] 2L, 8C

when the file you just opened has DOS <carriage-return><newline> line separators instead of UNIX <newline> line terminators.

You may also want to run the file command:

file fi*
file1:  ASCII text, with CRLF line terminators
file1~: ASCII text, with CRLF, CR, LF line terminators
file2:  ASCII text