Compare multiple arrays elements using awk

I need your help to discover missing elements for each box.
In theory each box should have 4 items: ITEM01, ITEM02, ITEM08, and ITEM10.
Some boxes either have a missing item (BOX02 ITEM08) or might have da duplicate item (BOX03 ITEM02) and missing another one (BOX03 ITEM01).

file01.txt

BOX01 ITEM01
BOX01 ITEM10
BOX01 ITEM08
BOX01 ITEM02
BOX02 ITEM01
BOX02 ITEM02
BOX02 ITEM10
BOX03 ITEM02
BOX03 ITEM10
BOX03 ITEM02
BOX03 ITEM08

Desired output:

Missing:
BOX02 ITEM08
BOX03 ITEM01

Duplicate items:
BOX03 ITEM02

To solve this presume I first need to create an item array I[$2]++ and then create and a BOX array B[$1]=$2 .
I don#t know how to write the code to check if items from array I exist in B.

To get the I array elements I have used:

awk '{I[$2]++}END{for (v in I) print v}' file01.txt

If you don't mind reversing the order of the output of the two sections of your output and don't mind a random order of reporting of boxes and items missing from boxes, you could try something like:

awk '
BEGIN {	print "Duplicate items:"
}
{
	b[$1]
	i[$2]
	if(($1, $2) in c)
		print
	else 	c[$1, $2]
}
END {	printf("\nMissing:\n")
	for(box in b)
		for(item in i)
			if(!((box, item) in c))
				print box, item
}' file01.txt

which, with your sample input file produced the output:

Duplicate items:
BOX03 ITEM02

Missing:
BOX02 ITEM08
BOX03 ITEM01

This assumes that each box you want to process contains and least one item and assumes that each item that is supposed to appear in all of your boxes appears in at least one of your boxes. If either of these assumptions is incorrect, you could always create another file or two that contain(s) the boxes and items you want to process.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk .

2 Likes

Thank you, that's much appreciated. The solution you provided works well.
Best Regards