awk saving field of first file into array

RozenKristal · December 10, 2012, 10:33pm

Hello guys, I just start trying out AWK and encounter a problem, I try to think a bit but seems my way is incorrect.
I have two input file, with the first file has only one field, the second file has 3 fields, I suppose to do stuffs to them by writing an awk program, kinda sort them out. Since I am not suppose to know how many records the first file has, I cant just use a for loop. I would like to save each record of the first file into an array so I can begin to compare to second file and sort thing out. But I am not sure how, can you guys give me an idea?

Sorry for my english, I am not native to the language.
I did a silly attemp by doing:

{ while (NR == FNR)}

but it obvious give me an infinite loop, what I want to know is how can I tell that I can stop assign fields to array when it begins to read the second file?

So what I desire is:
File1

CoCo
Hiel
Euda
SJHF
Euda
CoCo

File2:

CoCo   $39.4   paid
CoCo   $34   due
Euda   45   paid

If there is no $, that line is invalid, also the two digits after decimal points.
first field can appear several times.
If first field in file 1 not appear in file 2, print out balance 0, same thing with invalid line.
Desired output:

CoCo balance:$5.4
Hiel balance:$0.00

rdcwayx · December 10, 2012, 11:05pm

How about it?

awk '{sub(/\$/,"",$2)}/paid/{a[$1]+=$2}/due/{a[$1]-=$2}
    END{for (i in a) printf "%s balance:$%.2f \n",i,a}' infile

CoCo balance:$5.40
Euda balance:$45.00

RozenKristal · December 10, 2012, 11:27pm

Almost giving me the right output, sorry for confusion, in file one, there is no repeating for the field, so it appear only one. Only in file 2 that it can repeat. And the calculation I think is a bit off since line without $ and without 2 numbers after decimal points are treated as invalid, so nothing should be done. Here what I test with the code:

File1:
CoCo
Hiel
Euda
SJHF

File 2:
CoCo	$39.40	paid
CoCo	$34.00	due
Euda	45	due
CoCo	$16.05	due
Hiel	$50	paid
Euda	$12.45	due

Out put:
CoCo balance:$-10.65 
Euda balance:$-57.45 
Hiel balance:$50.00

What I think my desired output is:

CoCo balance: $-10.65
Euda balance: $-12.45
Hield balance: $50.00
SJHF balance: $0.00

pamu · December 11, 2012, 12:54am

awk 'NR==FNR{sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]?A[$1]+s:s;next}{print $0,"balance: \$",A[$0]?A[$0]:"0\.00"}' file2 file1

also with printf

awk 'NR==FNR{sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]?A[$1]+s:s;next}
    {printf "%s balance:$%.2f \n", $0,A[$0]?A[$0]:"0\.00"}' file2 file1

CoCo balance:$-10.65
Hiel balance:$50.00
Euda balance:$-57.45
SJHF balance:$0.00

RozenKristal · December 11, 2012, 12:58am

Thank pamu, but the output file is like this:

CoCo	$39.40	paid balance: $ 0.00
CoCo	$34.00	due balance: $ 0.00
Euda	45	due balance: $ 0.00
CoCo	$16.05	due balance: $ 0.00
Hiel	$50	paid balance: $ 0.00
Euda	$12.45	due balance: $ 0.00

it would be great if paid can - balance and give out the after calculated result, is there a way to do so?

With the same tested input files above, I hope to get something like this:

CoCo balance: $-10.65
Euda balance: $-12.45
Hield balance: $50.00
SJHF balance: $0.00

I think what rdc did was correct, but I would need exception for when there is no $, or not 2 digits after decimal points.

pamu · December 11, 2012, 1:01am

please check file sequence.. it should be
file2 file1 file2 should be first.
Please check my previous post.. Just added.. this

RozenKristal · December 11, 2012, 1:02am

Thank you sir. Oh, how I flip the code so file 1 will go before file 2? The order how it goes is pretty important for me. Also, how do I make exception when the amount column doesnt have dollars sign, or 2 digits after decimal points to be ignored? That would make Euda to have $-12.45, not $-57.45

rdcwayx · December 11, 2012, 1:09am

awk 'NR==FNR{ if ($2~/\$/) {sub(/\$/,"",$2); if (/paid/) {a[$1]+=$2} ;if (/due/) a[$1]-=$2}}
    NR>FNR{printf "%s balance:$%.2f \n",$1,a[$1]+0}' file2 file1

CoCo balance:$-10.65
Hiel balance:$50.00
Euda balance:$-12.45
SJHF balance:$0.00

RozenKristal · December 11, 2012, 1:16am

Is there a way to make it so order of file input become file1 file2?

rdcwayx · December 11, 2012, 1:19am

But what's the reason to put file1 first?

RozenKristal · December 11, 2012, 1:22am

It is a requirement :\
Oh, I also manage to make any line that contain no decimal points or has less than two or more than two digits after decimal points fail.

pamu · December 11, 2012, 1:40am

I am not clear what kind of this requirement it is.

Here we are not doing any thing to files so that it will impact any thing..

still for your requirement try..

awk 'NR==FNR{A[$0]=0;next}
    {if(A[$1] != ""){if($2 ~ /\$/){sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]+s}}}END{
    for(i in A){printf "%s balance:$%.2f \n", i,A}}' file1 file2

SJHF balance:$0.00
CoCo balance:$-10.65
Euda balance:$-12.45
Hiel balance:$50.00

and i assume you want $2 number like 25.35 only..

then try...

awk 'NR==FNR{A[$0]=0;next}
    {if(A[$1] != ""){if($2 ~ /\$/ && $2 ~ /[0-9][0-9].[0-9][0-9]$/){sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]+s}}}END{
    for(i in A){printf "%s balance:$%.2f \n", i,A}}' file1 file2

SJHF balance:$0.00
CoCo balance:$-10.65
Euda balance:$-12.45
Hiel balance:$0.00

RozenKristal · December 11, 2012, 1:42am

pamu:

I am not clear what kind of this requirement it is.

Here we are not doing any thing to files so that it will impact any thing..

still for your requirement try..

awk 'NR==FNR{A[$0]=0;next}
   {if(A[$1] != ""){if($2 ~ /\$/){sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]+s}}}END{
   for(i in A){printf "%s balance:$%.2f \n", i,A}}' file1 file2

SJHF balance:$0.00
CoCo balance:$-10.65
Euda balance:$-12.45
Hiel balance:$50.00

and i assume you want $2 number like 25.35 only..

then try...

awk 'NR==FNR{A[$0]=0;next}
   {if(A[$1] != ""){if($2 ~ /\$/ && $2 ~ /[0-9][0-9].[0-9][0-9]$/){sub("\\$","");s=$2;if($3 == "due"){s=0-$2};A[$1]=A[$1]+s}}}END{
   for(i in A){printf "%s balance:$%.2f \n", i,A}}' file1 file2

SJHF balance:$0.00
CoCo balance:$-10.65
Euda balance:$-12.45
Hiel balance:$0.00

Yep, that exactly how I made somechange to it. Thank you, I really appreciate you guys' help,

Scrutinizer · December 11, 2012, 2:22am

awk '{A[$1]} $2~/\./ && sub(/\$/,x,$2){A[$1]+=($3=="paid"?$2:-$2)}  END{for(i in A)printf "%s balance: $%.2f\n",i,A}' file1 file2

RozenKristal · December 11, 2012, 2:37am

Heh, when my winter break comes, would love to learn AWK more, this language is so neat... to see you guys can do it in one line really motivate me...

Scrutinizer · December 11, 2012, 2:44am

Yes, awk is a well thought out, compact, yet powerful language; although a script this long should usually not really be written on one line, this would normally be easier to read:

awk '
  {
    A[$1]
  }
  $2~/\./ && sub(/\$/,x,$2){
    A[$1]+=($3=="paid"?$2:-$2)
  }  
  END{
    for(i in A)printf "%s balance: $%.2f\n",i,A
  }
' file1 file2