AWK, sorting-if and print help.

Little-bit of awk experience, need some of the expert help on here. Browsed around here, got a little further, but I am still missing some pieces. Can you help me fill-in my missing awk cells?

Sample data file (leaving out ","'s):

Column 1       Column 2       Column 3        Column 4              etc.
apple            banana          ice cream       11:11 p/iz/za        etc.
attack          banana          etc                12:34 l/em/on       etc.
big               tomato          etc                etc.                    etc.
blowfish        tomato          etc                etc.                    etc.
cindy            tomato          etc               etc.                     etc.

What I have got thus far (pseudo-code):

awk -F","
{
$1=var1
if var1 ~ '/a/' var1=$1-fruit; else
if var1 ~ '/b/' var1=$1-animal; else
etc.
print "this is called" $var "thank you "

$2=var2
print $var2

$3=var3

$4=hmmm need it to just output "pizza" from that format and set to var4

print var3 " " var4

$5=var5
$6=var6

if count ($)2 appears more then once; then
for each instance
{
print $5 $6
}

--
The columns may be alpha-numeric, so I would like to script it as such.

Thank you for any help.

Can you post your data file in the original format and the desired output?

Input:

Col1	Col2	Col3	Col4	Col5	Col6	Col7	Col8	Col9
apple1-123856	banana12	shaws	fruit	food	4:20:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:00
apple1-123856	banana12	shaws	fruit	food	4:21:00 PM 8/18/2008	type3	 12.122.122.123	00:00:00:00:00:01
apple3-123656	banana34	shaws	fruit	food	4:24:00 PM 8/18/2008	type5	 12.122.122.125	00:00:00:00:00:09
apple3-123656	banana34	shaws	fruit	food	4:21:00 PM 8/18/2008	type6	 12.122.122.126	00:00:00:00:00:08
banana1-127456	banana77	shaws	fruit	food	4:23:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:10
banana2-133456	banana88	shaws	fruit	food	10:24:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:11

Output:

<name="apple1-123856.forapples" id="banana12" type="shaws" type1="fruit food" date="20080818">	
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:00" />
<subdata="type3" ip="12.122.122.123" mac="00:00:00:00:00:01" />
</name>
<name="apple3-123656.forapples" id="banana34" type="shaws" type1="fruit food" date="20080818">	
<subdata="type5" ip="12.122.122.125" mac="00:00:00:00:00:09" />
<subdata="type6" ip="12.122.122.126" mac="00:00:00:00:00:08" />
</name>
<name="banana1-127456.forbananas" id="banana77" type="shaws" type1="fruit food" date="20080818">	
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:10" />
</name>
<name="banana2-133456.forbananas" id="banana88" type="shaws" type1="fruit food" date="20080818">	
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:11" />
</name>

extra notes:
-if apple = .forapples
-if banana = .forbananas
-if id's are the same, the subdata (last 3 columns) are used

Now what is the desired output?
... and don't forget the [CODE] tags please !

Finished editing.
How about now?

---------- Post updated at 07:33 PM ---------- Previous update was at 04:50 PM ----------

Just kidding, now I am finished!

Awk script:

BEGIN {
# init previous match
        p_match=""
# init block open flag
        f_block=0
}

# check if new block
(p_match != $1) {

# if in block then close it
        if (f_block != 0)
        {
                print "</name>"
                f_block = 0
        }
# set the previous match thing
p_match = $1

# set the .for type
        if ( index($1,"apple") != 0)
                ext = ".forapple"
        else
                ext = ".forbanana"
# make the date thing
        split($8, d_thing, "/")

# print the intial line
        printf("<name=\"%s%s\" id=\"%s\" type=\"%s\" type1=\"%s %s\" date=\"%04d%02d%02d\">\n", $1, ext, $2, $3, $4, $5, int(d_thing[3]), int(d_thing[1]), int(d_thing[2]))

# print the subdata line
        printf("<subdata=\"%s\" ip=\"%s\" mac=\"%s\" />\n", $9, $10, $11)

# show we are in a block
        f_block = 1
# next line
next
}

# do this if id is same as previous
(p_match == $1) {
# print the subdata line
        printf("<subdata=\"%s\" ip=\"%s\" mac=\"%s\" />\n", $9, $10, $11)
}
END {
        if (f_block != 0)
                print "</name>"
}

the output from your input with your the header line.. just add a next in the BEGIN clause to consume the header line.

# awk -f fruit.awk f.test
<name="apple1-123856.forapple" id="banana12" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:00" />
<subdata="type3" ip="12.122.122.123" mac="00:00:00:00:00:01" />
</name>
<name="apple3-123656.forapple" id="banana34" type="shaws" type1="fruit food" date="20080818">
<subdata="type5" ip="12.122.122.125" mac="00:00:00:00:00:09" />
<subdata="type6" ip="12.122.122.126" mac="00:00:00:00:00:08" />
</name>
<name="banana1-127456.forbanana" id="banana77" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:10" />
</name>
<name="banana2-133456.forbanana" id="banana88" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:11" />
</name>

Just realized my mistake, the column names above were NOT correctly aligned above, so this script (which is awesome) doesn't work.
The columns of the input data should have read:

Col1	        Col2	        Col3	Col4	Col5	Col6	                Col7	 Col8	        Col9
apple1-123856	banana12	shaws	fruit	food	4:20:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:00
apple1-123856	banana12	shaws	fruit	food	4:21:00 PM 8/18/2008	type3	 12.122.122.123	00:00:00:00:00:01
apple3-123656	banana34	shaws	fruit	food	4:24:00 PM 8/18/2008	type5	 12.122.122.125	00:00:00:00:00:09
apple3-123656	banana34	shaws	fruit	food	4:21:00 PM 8/18/2008	type6	 12.122.122.126	00:00:00:00:00:08
banana1-127456	banana77	shaws	fruit	food	4:23:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:10
banana2-133456	banana88	shaws	fruit	food	10:24:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:11

This was totally my fault, could anyone help me fix the script now?
Just change the $X to the correct columns, but since I don't completely understand the date parse, I am not sure which column they should use.

just add a 'next' (no quotes) above the "}" in the BEGIN clause. I guessing you actually have the header columns because you didn't show the failure data... The script counts "columns" as things with whitespace between them.

Got it and fixed my column mistake.
Thank you.
Very very helpful.

hope below perl can help you some.

my %hash;
while(<DATA>){
	next if $.==1;
	my @tmp = split;
	my $key = $tmp[0]." ".$tmp[1]." ".$tmp[2]." ".$tmp[3]." ".$tmp[4]." ".$tmp[7];
	push @{$hash{$key}}, $tmp[8]." ".$tmp[9]." ".$tmp[10];
}
foreach my $item (keys %hash){
	my ($key,$category,$id,$type,$type1,$date) = $item =~ /(([^0-9]+)[^ ]*) ([^ ]*) ([^ ]*) ([^ ]* [^ ]*) ([^ ]*)/;
	print '<name="'.$key.'.for'.$category.'" id="'.$id.'" type="'.$type.'" type1="'.$type1.'" date="'.$date.'"'."\n";
	foreach  my $s_item (@{$hash{$item}}){
		my ($s_type,$ip,$mac) = $s_item =~ /([^ ]*) ([^ ]*) ([^ ]*)/;
		print '<subdata="'.$s_type.'" ip="'.$ip.'" mac="'.$mac.'" />'."\n";
	}
	print "</name>\n";
}
__DATA__
Col1	Col2	Col3	Col4	Col5	Col6	Col7	Col8	Col9
apple1-123856	banana12	shaws	fruit	food	4:20:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:00
apple1-123856	banana12	shaws	fruit	food	4:21:00 PM 8/18/2008	type3	 12.122.122.123	00:00:00:00:00:01
apple3-123656	banana34	shaws	fruit	food	4:24:00 PM 8/18/2008	type5	 12.122.122.125	00:00:00:00:00:09
apple3-123656	banana34	shaws	fruit	food	4:21:00 PM 8/18/2008	type6	 12.122.122.126	00:00:00:00:00:08
banana1-127456	banana77	shaws	fruit	food	4:23:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:10
banana2-133456	banana88	shaws	fruit	food	10:24:00 PM 8/18/2008	type2	 12.122.122.122	00:00:00:00:00:11

Very good, now I know how to do it with perl!
Thanks.