Fill the Key fields : Please help us

charandevu · March 30, 2008, 8:10am

Hi ....

we are having the below file .Column 1, Column 2 ,column 3 are key fields...
In the below ...for 2 nd , 3 rd row the repeated key column is missing ....

i want the new file to be populated with all missing key columns.

E100,0,5/29/1993,0,E001,E000,A,500000,41666.667,240.384615
_____0,1/23/1994,0,E001,E003,A,125400,10450,60.288462
_____0,6/4/1994,0,E001,E003,A,95000,7916.667,45.673077
E101,0,7/30/1993,0,E001,E003,A,87000,7250,41.826923
______ 1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E103,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E104,0,2/15/1995,0,E001,E001,A,78000,6500,37.5

I want the result to be :

E100,0,5/29/1993,0,E001,E000,A,500000,41666.667,240.384615
E100, 0,1/23/1994,0,E001,E003,A,125400,10450,60.288462
E100, 0,6/4/1994,0,E001,E003,A,95000,7916.667,45.673077
E101,0,7/30/1993,0,E001,E003,A,87000,7250,41.826923
E101,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E103,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E104,0,2/15/1995,0,E001,E001,A,78000,6500,37.5

Please help me ........

era · March 30, 2008, 10:56am

Write an awk script which remembers the fields from the previous line, and adds as many to the front as are missing to form a full line. Assume the first line indicates how many fields a full line should have. Read up on array handling in the awk documentation.

charandevu · March 31, 2008, 2:07am

I am new to this awk command ...........can you give a sample code on this requirment....that would be a helpful.....

Thanks your response
charan

Klashxx · March 31, 2008, 2:30am

>cat file
E100,0,5/29/1993,0,E001,E000,A,500000,41666.667,240.384615
0,1/23/1994,0,E001,E003,A,125400,10450,60.288462
0,6/4/1994,0,E001,E003,A,95000,7916.667,45.673077
E101,0,7/30/1993,0,E001,E003,A,87000,7250,41.826923
1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E103,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E104,0,2/15/1995,0,E001,E001,A,78000,6500,37.5

>awk 'NF!=10{for(i=1;i<=(10-NF);i++)printf("%s,",k);print;next}{for(i=1;i<=3;i++)k=$i}1' FS=',' f1
E100,0,5/29/1993,0,E001,E000,A,500000,41666.667,240.384615
E100,0,1/23/1994,0,E001,E003,A,125400,10450,60.288462
E100,0,6/4/1994,0,E001,E003,A,95000,7916.667,45.673077
E101,0,7/30/1993,0,E001,E003,A,87000,7250,41.826923
E101,0,1/9/1993,0,E001,E003,A,45200,3766.667,21.730769
E103,0,2/3/1995,0,E001,E003,A,15000,1250,7.211538
E104,0,2/15/1995,0,E001,E001,A,78000,6500,37.5

ashish4422 · March 31, 2008, 5:33am

Hi Klashxx,

can you explain the logic used in for loop?

awk 'NF!=10{for(i=1;i<=(10-NF);i++)printf("%s,",k[i]);print;next}{for(i=1;i<=3;i++)k[i]=$i}1' FS=',' f1

Thanks
Ashish

era · March 31, 2008, 9:11am

It does pretty much what I suggested, except it hard-codes the number of expected fields to 10. You could count the number of fields on the first line and use that on subsequent lines for a slightly more general approach. It remembers $1 in k[1], $2 in k[2], and $3 in k[3]; and if the line has too few fields, prints what's in k[1], k[2], etc from the previous line as many as is required. This explanation is backwards relative to the script because it prints first, and then remembers for the next line.

era · March 31, 2008, 2:14pm

Here's an attempt at generaliziing this.

awk 'NR==1{ for (i=1; i<=NF; ++i) { k=$i } keep=NF; print; next}
NF!=keep{ for(i=1;i<=keep-NF;++i)printf("%s,",k); print;
                   for(j=i;j<=keep;++j) { n=j-i+1;k[j]=$n}; next; }
{ for (i=1;i<=NF;i++) k=$i; print}' FS=',' f1

Completely untested but might even work. Note the addition of the code to copy down stuff from a line which was too short, so if you have lines with successively more information missing, that too should work now. But again, I didn't test this code properly.

satyam_sat · April 3, 2008, 8:54am

awk 'NF!=10{for(i=1;i<=(10-NF);i++)printf("%s,",k[i]);print;next}{for(i=1;i<=3;i++)k[i]=$i}1' FS=',' f1

This code is working for some what .

it is not working in case of ',' or '|' seperated

E100|0|5/29/1993|0|E001|E000|A|500000|41666.667|240.384615
|0|1/23/1994|0|E001|E003|A|125400|10450|60.288462
| |0|6/4/1994|0|E001|E003|A|95000|7916.667|45.673077
E101|0|7/30/1993|0|E001|E003|A|87000|7250|41.826923
| |1/9/1993|0|E001|E003|A|45200|3766.667|21.730769
E103|0|2/3/1995|0|E001|E003|A|15000|1250|7.211538
E104|0|2/15/1995|0|E001|E001|A|78000|6500|37.5

can u help me ..........

Klashxx · April 3, 2008, 12:48pm

Hi, u need to improve your research efforts, ..:

>cat file
E100|0|5/29/1993|0|E001|E000|A|500000|41666.667|240.384615
|0|1/23/1994|0|E001|E003|A|125400|10450|60.288462
| |6/4/1994|0|E001|E003|A|95000|7916.667|45.673077
E101|0|7/30/1993|0|E001|E003|A|87000|7250|41.826923
| |1/9/1993|0|E001|E003|A|45200|3766.667|21.730769
E103|0|2/3/1995|0|E001|E003|A|15000|1250|7.211538
E104|0|2/15/1995|0|E001|E001|A|78000|6500|37.5

>awk '{gsub(/^\|| +\|/,"")                                                                                                         
 if(NF!=10){for(i=1;i<=(10-NF);i++)printf("%s|",k);print;next}
{for(i=1;i<=3;i++)k=$i}print}' FS='|' file
E100|0|5/29/1993|0|E001|E000|A|500000|41666.667|240.384615
E100|0|1/23/1994|0|E001|E003|A|125400|10450|60.288462
E100|0|6/4/1994|0|E001|E003|A|95000|7916.667|45.673077
E101|0|7/30/1993|0|E001|E003|A|87000|7250|41.826923
E101|0|1/9/1993|0|E001|E003|A|45200|3766.667|21.730769
E103|0|2/3/1995|0|E001|E003|A|15000|1250|7.211538
E104|0|2/15/1995|0|E001|E001|A|78000|6500|37.5

charandevu · April 7, 2008, 8:50am

awk '{gsub(/^\|| +\|/,"") if(NF!=10){for(i=1;i<=(10-NF);i++)printf("%s|",k[i]);print;next} {for(i=1;i<=3;i++)k[i]=$i}print}' FS='|' key.txt

when i am using this entire comand in a single line it is throw ing syntax error........but when i am using the above code in a two line it is working fine..............

i wann to store this entire into a script and i wann to call this dynamiclly...

thanks charan

charandevu · April 8, 2008, 7:47am

hi friend,
file1.
12345|DZ012|20071220|FG456|234567|IND
| |20080101|DE345|456789|USA
|DZ012| | |500001|AUS
45678|ED456|20011203|ED345|456789|CAN
|SD345| | |456790|MEX
| | |DF232|567890|ARG
56789|WQ456|20071230|WD456|789012|WES
56789|AQ234|20080120|WS456|890123|JAM

when we are running the script we are getting error
awk '{gsub(/^\|| +\|/,"")
if(NF!=6){for(i=1;i<=(6-NF);i++)printf("%s|",k[i]);print;next} {for(i=1;i<=4;i++)k[i]=$i}print}' FS='|' key

out put
12345|DZ012|20071220|FG456|234567|IND
12345|DZ012|20080101|DE345|456789|USA
12345|DZ012|20071220|DZ012|500001|AUS
45678|ED456|20011203|ED345|456789|CAN
45678|ED456|20011203|SD345|456790|MEX
45678|ED456|20011203|DF232|567890|ARG
56789|WQ456|20071230|WD456|789012|WES
56789|AQ234|20080120|WS456|890123|JAM

45678|ED456|20011203|SD345|456790|MEX here the 2n column is repated and the 2nd column moved to 4column.......and the 4th column is missing....some more buggs are there

the above key vlaues should populate as it is...........very urgent

era · April 8, 2008, 9:40am

 vnix$ awk -F'|' 'BEGIN { OFS="|"; }
> { for(i=1;i<=NF;++i) { if ($i=="" || $i==" ") $i=k; k=$i; } print; }' <<HERE
> 12345|DZ012|20071220|FG456|234567|IND
> | |20080101|DE345|456789|USA
> |DZ012| | |500001|AUS
> 45678|ED456|20011203|ED345|456789|CAN
> |SD345| | |456790|MEX
> | | |DF232|567890|ARG
> 56789|WQ456|20071230|WD456|789012|WES
> 56789|AQ234|20080120|WS456|890123|JAM
> HERE
12345|DZ012|20071220|FG456|234567|IND
12345|DZ012|20080101|DE345|456789|USA
12345|DZ012|20080101|DE345|500001|AUS
45678|ED456|20011203|ED345|456789|CAN
45678|SD345|20011203|ED345|456790|MEX
45678|SD345|20011203|DF232|567890|ARG
56789|WQ456|20071230|WD456|789012|WES
56789|AQ234|20080120|WS456|890123|JAM

This is pretty much the code I posted a week ago, adapted for the changed field separator and the changed convention for a skipped field (missing vs. empty).