Little-bit of awk experience, need some of the expert help on here. Browsed around here, got a little further, but I am still missing some pieces. Can you help me fill-in my missing awk cells?
Sample data file (leaving out ","'s):
Column 1 Column 2 Column 3 Column 4 etc.
apple banana ice cream 11:11 p/iz/za etc.
attack banana etc 12:34 l/em/on etc.
big tomato etc etc. etc.
blowfish tomato etc etc. etc.
cindy tomato etc etc. etc.
What I have got thus far (pseudo-code):
awk -F","
{
$1=var1
if var1 ~ '/a/' var1=$1-fruit; else
if var1 ~ '/b/' var1=$1-animal; else
etc.
print "this is called" $var "thank you "
$2=var2
print $var2
$3=var3
$4=hmmm need it to just output "pizza" from that format and set to var4
print var3 " " var4
$5=var5
$6=var6
if count ($)2 appears more then once; then
for each instance
{
print $5 $6
}
--
The columns may be alpha-numeric, so I would like to script it as such.
Thank you for any help.
Can you post your data file in the original format and the desired output?
Input:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
apple1-123856 banana12 shaws fruit food 4:20:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:00
apple1-123856 banana12 shaws fruit food 4:21:00 PM 8/18/2008 type3 12.122.122.123 00:00:00:00:00:01
apple3-123656 banana34 shaws fruit food 4:24:00 PM 8/18/2008 type5 12.122.122.125 00:00:00:00:00:09
apple3-123656 banana34 shaws fruit food 4:21:00 PM 8/18/2008 type6 12.122.122.126 00:00:00:00:00:08
banana1-127456 banana77 shaws fruit food 4:23:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:10
banana2-133456 banana88 shaws fruit food 10:24:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:11
Output:
<name="apple1-123856.forapples" id="banana12" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:00" />
<subdata="type3" ip="12.122.122.123" mac="00:00:00:00:00:01" />
</name>
<name="apple3-123656.forapples" id="banana34" type="shaws" type1="fruit food" date="20080818">
<subdata="type5" ip="12.122.122.125" mac="00:00:00:00:00:09" />
<subdata="type6" ip="12.122.122.126" mac="00:00:00:00:00:08" />
</name>
<name="banana1-127456.forbananas" id="banana77" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:10" />
</name>
<name="banana2-133456.forbananas" id="banana88" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:11" />
</name>
extra notes:
-if apple = .forapples
-if banana = .forbananas
-if id's are the same, the subdata (last 3 columns) are used
Now what is the desired output?
... and don't forget the [CODE] tags please !
Finished editing.
How about now?
---------- Post updated at 07:33 PM ---------- Previous update was at 04:50 PM ----------
Just kidding, now I am finished!
Awk script:
BEGIN {
# init previous match
p_match=""
# init block open flag
f_block=0
}
# check if new block
(p_match != $1) {
# if in block then close it
if (f_block != 0)
{
print "</name>"
f_block = 0
}
# set the previous match thing
p_match = $1
# set the .for type
if ( index($1,"apple") != 0)
ext = ".forapple"
else
ext = ".forbanana"
# make the date thing
split($8, d_thing, "/")
# print the intial line
printf("<name=\"%s%s\" id=\"%s\" type=\"%s\" type1=\"%s %s\" date=\"%04d%02d%02d\">\n", $1, ext, $2, $3, $4, $5, int(d_thing[3]), int(d_thing[1]), int(d_thing[2]))
# print the subdata line
printf("<subdata=\"%s\" ip=\"%s\" mac=\"%s\" />\n", $9, $10, $11)
# show we are in a block
f_block = 1
# next line
next
}
# do this if id is same as previous
(p_match == $1) {
# print the subdata line
printf("<subdata=\"%s\" ip=\"%s\" mac=\"%s\" />\n", $9, $10, $11)
}
END {
if (f_block != 0)
print "</name>"
}
the output from your input with your the header line.. just add a next in the BEGIN clause to consume the header line.
# awk -f fruit.awk f.test
<name="apple1-123856.forapple" id="banana12" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:00" />
<subdata="type3" ip="12.122.122.123" mac="00:00:00:00:00:01" />
</name>
<name="apple3-123656.forapple" id="banana34" type="shaws" type1="fruit food" date="20080818">
<subdata="type5" ip="12.122.122.125" mac="00:00:00:00:00:09" />
<subdata="type6" ip="12.122.122.126" mac="00:00:00:00:00:08" />
</name>
<name="banana1-127456.forbanana" id="banana77" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:10" />
</name>
<name="banana2-133456.forbanana" id="banana88" type="shaws" type1="fruit food" date="20080818">
<subdata="type2" ip="12.122.122.122" mac="00:00:00:00:00:11" />
</name>
Just realized my mistake, the column names above were NOT correctly aligned above, so this script (which is awesome) doesn't work.
The columns of the input data should have read:
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
apple1-123856 banana12 shaws fruit food 4:20:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:00
apple1-123856 banana12 shaws fruit food 4:21:00 PM 8/18/2008 type3 12.122.122.123 00:00:00:00:00:01
apple3-123656 banana34 shaws fruit food 4:24:00 PM 8/18/2008 type5 12.122.122.125 00:00:00:00:00:09
apple3-123656 banana34 shaws fruit food 4:21:00 PM 8/18/2008 type6 12.122.122.126 00:00:00:00:00:08
banana1-127456 banana77 shaws fruit food 4:23:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:10
banana2-133456 banana88 shaws fruit food 10:24:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:11
This was totally my fault, could anyone help me fix the script now?
Just change the $X to the correct columns, but since I don't completely understand the date parse, I am not sure which column they should use.
just add a 'next' (no quotes) above the "}" in the BEGIN clause. I guessing you actually have the header columns because you didn't show the failure data... The script counts "columns" as things with whitespace between them.
Got it and fixed my column mistake.
Thank you.
Very very helpful.
hope below perl can help you some.
my %hash;
while(<DATA>){
next if $.==1;
my @tmp = split;
my $key = $tmp[0]." ".$tmp[1]." ".$tmp[2]." ".$tmp[3]." ".$tmp[4]." ".$tmp[7];
push @{$hash{$key}}, $tmp[8]." ".$tmp[9]." ".$tmp[10];
}
foreach my $item (keys %hash){
my ($key,$category,$id,$type,$type1,$date) = $item =~ /(([^0-9]+)[^ ]*) ([^ ]*) ([^ ]*) ([^ ]* [^ ]*) ([^ ]*)/;
print '<name="'.$key.'.for'.$category.'" id="'.$id.'" type="'.$type.'" type1="'.$type1.'" date="'.$date.'"'."\n";
foreach my $s_item (@{$hash{$item}}){
my ($s_type,$ip,$mac) = $s_item =~ /([^ ]*) ([^ ]*) ([^ ]*)/;
print '<subdata="'.$s_type.'" ip="'.$ip.'" mac="'.$mac.'" />'."\n";
}
print "</name>\n";
}
__DATA__
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
apple1-123856 banana12 shaws fruit food 4:20:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:00
apple1-123856 banana12 shaws fruit food 4:21:00 PM 8/18/2008 type3 12.122.122.123 00:00:00:00:00:01
apple3-123656 banana34 shaws fruit food 4:24:00 PM 8/18/2008 type5 12.122.122.125 00:00:00:00:00:09
apple3-123656 banana34 shaws fruit food 4:21:00 PM 8/18/2008 type6 12.122.122.126 00:00:00:00:00:08
banana1-127456 banana77 shaws fruit food 4:23:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:10
banana2-133456 banana88 shaws fruit food 10:24:00 PM 8/18/2008 type2 12.122.122.122 00:00:00:00:00:11
Very good, now I know how to do it with perl!
Thanks.