Need to remove first character from every third line (or revised nawk).

VPREATR · October 6, 2008, 7:32pm

Here's the data I'm starting with (example output of two combined queries, filesize: 284k)

3000877|555-55-1111|2|7/30/2008|TEST|P.O. BOX 1111|PALM DESERT|CA|92211||5555555555||||||||48|||1||1|3|||2|||||||||||||1|3||2|2
||||2||9||3|1|2|2|2|1|3|0||5|2|||||||||88||3|2||3|2||||2|1|||6|5/31/2008|2||||9|AD|||42|42||||||Y|555-55-1111|SMITH|JOHN|||12
/23/1960|2|WH|||||||||Y
3000178|555-55-1112|2|7/23/2008|TEST|P.O. BOX 1112|TEMECULA|CA|92591||5555555555||||||||33|||1||1|3|||2|||||||||||||3|3||2|
2||||2||9||2|1|2|2|2|2|3|0||5|2|||||||||88|||2||3|2||||2|9|||||2||||9|A|||42|42||||||Y|555-55-1112|SMITH|JACK|||12/8/1975|2|BL|
||||||||Y
3000317|555-55-1113|2|7/29/2008|TEST|P.O. BOX 1113|MORENO VALLEY|CA|92556||5555555555||||||||55|||1||4|1|||2|||||||||||||1|3||2|2||||
2||9||1|0|2|2|2|2|3|0||5|2|||||||||88|||2||3|2||||2|9|||||2||||9|A|||42|42||||||Y|555-55-1113|SMITH|JOE|||11/28/1953|2|AO|||||||
||Y

Then I run the following nawk script as a \n is needed after the #103 and #120 entries. (nawk -f scriptname > filename1)

[
BEGIN {
   FS="|"
       }
 
{
  OFS="|"
}
 
{
print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,
$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,
$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,
$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,
$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,
$100,$101,$102,$103,"\n",
$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,
$117,$118,$119,$120,"\n"
}
END {}

The nawk script output (i.e. filename1) is:

3000877|555-55-1111|2|7/30/2008|TEST|P.O. BOX 1111|PALM DESERT|CA|92211||5555555555||||||||48|||1||1|3|||2|||||||||||||1|3||2|2
||||2||9||3|1|2|2|2|1|3|0||5|2|||||||||88||3|2||3|2||||2|1|||6|5/31/2008|2||||9|AD|||42|42||||||Y|
|555-55-1111|SMITH|JOHN|||12/23/1960|2|WH|||||||||Y|

3000178|555-55-1112|2|7/23/2008|TEST|P.O. BOX 1112|TEMECULA|CA|92591||5555555555||||||||33|||1||1|3|||2|||||||||||||3|3||2|
2||||2||9||2|1|2|2|2|2|3|0||5|2|||||||||88|||2||3|2||||2|9|||||2||||9|A|||42|42||||||Y|
|555-55-1112|SMITH|JACK|||12/8/1975|2|BL|||||||||Y|

3000317|555-55-1113|2|7/29/2008|TEST|P.O. BOX 1112|MORENO VALLEY|CA|92556||555555555||||||||55|||1||4|1|||2|||||||||||||1|3||2|2||||
2||9||1|0|2|2|2|2|3|0||5|2|||||||||88|||2||3|2||||2|9|||||2||||9|A|||42|42||||||Y|
|555-55-1113|SMITH|JOE|||11/28/1953|2|AO|||||||||Y|

The problem is on the third line of each entry, the nawk script has inserted an additional first character pipe (i.e. |) and it's causing a great deal of havoc given my import requirements. Now I've tried various sed methods of removal based on the first character of every 3rd line throughout the entire file, sadly without success. Lastly, I've revised the nawk script in attempt to exclude the additional pipe character, all without success.

|555-55-1113|SMITH|JOE|||11/28/1953|2|AO|||||||||Y|

It needs to be:

555-55-1113|SMITH|JOE|||11/28/1953|2|AO|||||||||Y|

Review, thoughts and suggestions are truly welcomed.

Thanks!

vidyadhar85 · October 6, 2008, 7:47pm

try using "\b" after "\n"

VPREATR · October 6, 2008, 7:55pm

Here's the output:

3000877|555-55-1111|2|7/30/2008|TEST|P.O. BOX 1111|PALM DESERT|CA|92211||555555555||||||||48|||1||1|3|||2|||||||||||||1|3||2|2
||||2||9||3|1|2|2|2|1|3|0||5|2|||||||||88||3|2||3|2||||2|1|||6|5/31/2008|2||||9|AD|||42|42||||||Y|
^H|555-55-1111|SMITH|JOHN|||12/23/1960|2|WH|||||||||Y|
^H

cfajohnson · October 6, 2008, 8:17pm

BEGIN { FS = OFS = "|" }
{
 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,
     $16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,
     $29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,
     $42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,
     $55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,
     $68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,
     $81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,
     $94,$95,$96,$97,$98,$99,$100,$101,$102,$103
 print ""
 print $104,$105,$106,$107,$108,$109,$110,$111,$112,$113,
       $114,$115,$116,$117,$118,$119,$120
 print ""
}

VPREATR · October 6, 2008, 8:40pm

cfajohnson:

BEGIN { FS = OFS = "|" }
{
 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,
   $16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,
   $29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,
   $42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,
   $55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,
   $68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,
   $81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,
   $94,$95,$96,$97,$98,$99,$100,$101,$102,$103
 print ""
 print $104,$105,$106,$107,$108,$109,$110,$111,$112,$113,
   $114,$115,$116,$117,$118,$119,$120
 print ""
}

This did the trick:

BEGIN { FS = OFS = "|" }
{
 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,
     $16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,
     $29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,
     $42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,
     $55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,
     $68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,
     $81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,
     $94,$95,$96,$97,$98,$99,$100,$101,$102,$103
 print $104,$105,$106,$107,$108,$109,$110,$111,$112,$113,
       $114,$115,$116,$117,$118,$119,$120
 print ""
}

The first print="" statement was removed as it was creating an additional/unnecessary linefeed (i.e. separator).

Thank you!

vijay_0209 · October 6, 2008, 11:43pm

try this....tell me if this works fine.....

awk '{
if(NR%3!=0)
{
print
}
else
{
for(i=1;i<=NF;i++)
{
if(i==1)
{
printf("%s ",substr($1,2,length($1)-1))
}
else
{
printf("%s ",$i)
}
}
printf"\n"
}
}' file

RahulJoshi · October 7, 2008, 12:51am

after ur old o/p just give pipe sign and sed 's/^|//g'