I need your help to discover missing elements for each box.
In theory each box should have 4 items: ITEM01, ITEM02, ITEM08, and ITEM10.
Some boxes either have a missing item (BOX02 ITEM08) or might have da duplicate item (BOX03 ITEM02) and missing another one (BOX03 ITEM01).
file01.txt
BOX01 ITEM01
BOX01 ITEM10
BOX01 ITEM08
BOX01 ITEM02
BOX02 ITEM01
BOX02 ITEM02
BOX02 ITEM10
BOX03 ITEM02
BOX03 ITEM10
BOX03 ITEM02
BOX03 ITEM08
Desired output:
Missing:
BOX02 ITEM08
BOX03 ITEM01
Duplicate items:
BOX03 ITEM02
To solve this presume I first need to create an item array I[$2]++ and then create and a BOX array B[$1]=$2 .
I don#t know how to write the code to check if items from array I exist in B.
To get the I array elements I have used:
awk '{I[$2]++}END{for (v in I) print v}' file01.txt
If you don't mind reversing the order of the output of the two sections of your output and don't mind a random order of reporting of boxes and items missing from boxes, you could try something like:
awk '
BEGIN { print "Duplicate items:"
}
{
b[$1]
i[$2]
if(($1, $2) in c)
print
else c[$1, $2]
}
END { printf("\nMissing:\n")
for(box in b)
for(item in i)
if(!((box, item) in c))
print box, item
}' file01.txt
which, with your sample input file produced the output:
Duplicate items:
BOX03 ITEM02
Missing:
BOX02 ITEM08
BOX03 ITEM01
This assumes that each box you want to process contains and least one item and assumes that each item that is supposed to appear in all of your boxes appears in at least one of your boxes. If either of these assumptions is incorrect, you could always create another file or two that contain(s) the boxes and items you want to process.
As always, if you want to try this on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
or nawk
.
2 Likes
Thank you, that's much appreciated. The solution you provided works well.
Best Regards