help sum columns by break in first column with awk or sed or something.

syadnom · January 31, 2009, 11:21am

I have some data that is something like this?

item: onhand counted location
ITEM0001 1 0 a1
ITEM0001 0 1 a2
ITEM0002 5 0 b5
ITEM0002 0 6 c1

I want to sum up the on hand and counded lines by item number for an output like:
item: onhand counted location difference
ITEM0001 1 1 a1,a2 0
ITEM0002 5 6 b5,c1 1

I would then filter this through awk with an if $2 <> $3 print $0 as I only want those lines with some difference between the onhand and counted numbers

Unfortunately, I dont have access to the program that creates the original data so I cant manipulate that and am stuck with multiple lines of output per item number.

I would very much appreciate any help. I dont know how to handle multiple input lines like this...

I prefer awk and sed and similar 'standard' utilities. if this must be done in perl or something then I will have much more learning to understand the help!

Thanks

otheus · January 31, 2009, 12:02pm

This is almost a FAQ. Search around here for more threads on "column sum". Meanwhile, it's easy enough to use awk to filter and then add:

awk '$2 != $3 { sum+=$2; } END { print sum }'  input.txt

vgersh99 · January 31, 2009, 4:48pm

otheus:

This is almost a FAQ. Search around here for more threads on "column sum". Meanwhile, it's easy enough to use awk to filter and then add:
awk '$2 <> $3 { sum+=$2; } END { print sum }'  input.txt

what's '<>' ?

awk '$2 != $3 { sum+=$2; } END { print sum }'  input.txt

otheus · January 31, 2009, 7:16pm

Um SQL ? The poster originally had that and I copied him. Doh. I corrected my post.
Thanks vgersh

syadnom · February 1, 2009, 3:09am

here is some actual data since I dont understand how this works.

this is the actual data I am working with. lets call it file 'data'. the numbers line up perfectly on the command line.

here is my script

I set fieldwidths as there are random spaces that mess up the process.

so $1 = all the detail up to the first number, $2 is on number and $3 is the other. $4 is the junk on the end.

I run this like
./script data

my output

here is the output. I see that it is adding up column $2 and printing it after the line but it does not reset the number on part number changes. I took the END out as I wanted the numbers after each part number.

How do I have the sum reset after each change in part number? (change in $1)

also, what is the appropriate way to sum both $2 and $3?

thanks!

syadnom · February 1, 2009, 3:39am

If awk cant do this directly, does it make and sense to first take the part number which is the first 15 characters and dump it to a temp file, then run uniq on it. afterwards do something like:

#filter_sum.awk
/PARTNUM/ { sum+=$2}
END { print sum }
except instead of /PARTNUM/ so some sort of
for x in `cat sum.awk.tempfile`;do gawk '{ /$x/ { sum+=$2} END { print sum }'

i havent gotten it to work yet but need some opinions on whether this is the right way..

otheus · February 1, 2009, 4:23am

I don't get this. awk (by default) separates fields by whitespace. If there are "random" spaces, this would adjust the column width, no?

But whatever, you have your fields:

($1 != lastpartnum) { print lastpartnum, sum1,sum2; sum2=sum1=0; } 
($2 != $3) { sum1+=$2; sum2+=$3 }
{ lastpartnum=$1 }