awk problem with syntax

SkySmart · October 29, 2013, 12:45pm

awk -v sw="lemons|dogs" 'NR>100 && NR<200 BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt

what am i doing wrong with the above code? im pretty sure the issue is in the bolded. how can this be fixed?

Akshay_Hegde · October 29, 2013, 12:52pm

skysmart:

awk -v sw="lemons|dogs" 'NR>100 && NR<200 BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt
what am i doing wrong with the above code? im pretty sure the issue is in the bolded. how can this be fixed?

'NR>100 && NR<200 BEGIN { c=split(sw,a,"[|]"); } to BEGIN { c=split(sw,a,"[|]"); }

NR>100 && NR<200{ for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }

{ o=o ;(a"="(d[a]?d[a]:0)","); } what is o ? where you defined ?
sub(",*$","",o); print o;

if you want line use $0

what is the purpose of code ? if you show input and expected output it will be helpful for us.

Yoda · October 29, 2013, 1:05pm

BEGIN and END special rules can be intermixed with other rules, but you cannot add another rule with these. So below is wrong:

'NR>100 && NR<200 BEGIN ..

Correction:

awk -v sw="lemons|dogs" '
        NR > 100 && NR < 200 {
                for (w in a)
                {
                        if ($0 ~ a[w])
                                d[a[w]]++
                }
        }
        BEGIN {
                c = split(sw,a,"[|]")
        }
        END {
                for (i in a)
                {
                        o = o (a"="(d[a]?d[a]:0)",")
                }
                sub(",*$","",o)
                print o
        }
' /home/jahitt/data.txt

Don_Cragun · October 29, 2013, 1:26pm

Yoda's fix will give you a working program that counts the number of lines from line number 101 through line number 199 that contain "lemons" and that contain "dogs" and print them at the end. But, you didn't tell us what this script is supposed to do.

Another way to read what you were trying to do would be print lines 101 through 199 from your input file and at the end print the number of lines in the entire file that contaied "dogs" and the number of lines in the entire file that contained "lemons". If that was your intent, the one character change marked in red below to your original script should work:

awk -v sw="lemons|dogs" 'NR>100 && NR<200;BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt

Although I prefer more readable code like:

awk -v sw="lemons|dogs" '
NR>100 && NR<200
BEGIN { c=split(sw,a,"[|]")
}
{       for (w in a) {
                if ($0 ~ a[w])
                        d[a[w]]++
        }
}
END {   for (i in a) {
                o=o (a"="(d[a]?d[a]:0)",")
        }
        sub(",*$","",o)
        print o
}' /home/jahitt/data.txt

If your input file contained:

lemons and dogs
lemons only
cats and dogs
dogs only
cats only
lemons and cats and dogs

the above scripts produce:

dogs=4,lemons=3

but the output order is unspecified.

SkySmart · October 29, 2013, 2:03pm

don cragun:

Yoda's fix will give you a working program that counts the number of lines from line number 101 through line number 199 that contain "lemons" and that contain "dogs" and print them at the end. But, you didn't tell us what this script is supposed to do.

Another way to read what you were trying to do would be print lines 101 through 199 from your input file and at the end print the number of lines in the entire file that contaied "dogs" and the number of lines in the entire file that contained "lemons". If that was your intent, the one character change marked in red below to your original script should work:
awk -v sw="lemons|dogs" 'NR>100 && NR<200;BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt
Although I prefer more readable code like:
awk -v sw="lemons|dogs" '
NR>100 && NR<200
BEGIN { c=split(sw,a,"[|]")
}
{       for (w in a) {
   if ($0 ~ a[w])
   d[a[w]]++
   }
}
END {   for (i in a) {
   o=o (a"="(d[a]?d[a]:0)",")
   }
   sub(",*$","",o)
   print o
}' /home/jahitt/data.txt
If your input file contained:
lemons and dogs
lemons only
cats and dogs
dogs only
cats only
lemons and cats and dogs
the above scripts produce:
dogs=4,lemons=3
but the output order is unspecified.

your assumption is right on target! thank you.

i just thought of a different possibility, what happens if i want to exclude (for the string 'lemons') all lines that contain the word 'only'?

so in your output, if the lemon lines containing 'only' are excluded, then, the count should be:

dogs=4,lemons=2

Don_Cragun · October 29, 2013, 2:38pm

skysmart:

your assumption is right on target! thank you.

i just thought of a different possibility, what happens if i want to exclude (for the string 'lemons') all lines that contain the word 'only'?

so in your output, if the lemon lines containing 'only' are excluded, then, the count should be:
dogs=4,lemons=2

The trivial way is to special case "lemons" and "only" by changing the line:

                if ($0 ~ a[w])

to:

                if ($0 ~ a[w] && (a[w] != "lemons" || $0 !~ "only"))

But, if you decide to change basic logic in your original requirements, you should consider whether you need to redesign everything so that each counted pattern has a list of zero or more exclusions that should be considered. (And, I'm not going to try to guess at your new requirements and propose a new syntax for your sw variable to make that happen.)

Quick prototyping works well sometimes. But, sitting down and clearly defining your requirements before you start programming will usually give you a much more coherent, maintainable piece of software that works better and does what you want.

greet_sed · October 29, 2013, 2:56pm

In perl:

#!/usr/bin/perl -w

my (@lem_lines,@dogs_lines);

while(<>){
if (($. > 100) && ($. < 200)) {
push @lem_lines,$_ if /^lemons (?!only)/;
push @dogs_lines,$_ if /dogs/;
}
}
my $lem=scalar @lem_lines;
my $dogs=scalar @dogs_lines;
print "Number of lemons excluding ( lemons only ) in lines from 101 to 199 is : $lem\n";
print "count of dogs in lines from 101 to 199 is: $dogs\n";

Run as

perl try.pl input_file

I tried for the given input by Don Cragun and works well.
Thanks Don C for your explanation about the problem.

Ofcourse , regular expression has to be updated to look for dogs or lemons only etc if skysmart is looking for specific locations for example .