awk problem with syntax

awk -v sw="lemons|dogs" 'NR>100 && NR<200 BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt

what am i doing wrong with the above code? im pretty sure the issue is in the bolded. how can this be fixed?

'NR>100 && NR<200 BEGIN { c=split(sw,a,"[|]"); } to BEGIN { c=split(sw,a,"[|]"); }

NR>100 && NR<200{ for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }

{ o=o ;(a"="(d[a]?d[a]:0)","); } what is o ? where you defined ?
sub(",*$","",o); print o;

if you want line use $0

what is the purpose of code ? if you show input and expected output it will be helpful for us.

1 Like

BEGIN and END special rules can be intermixed with other rules, but you cannot add another rule with these. So below is wrong:

'NR>100 && NR<200 BEGIN ..

Correction:

awk -v sw="lemons|dogs" '
        NR > 100 && NR < 200 {
                for (w in a)
                {
                        if ($0 ~ a[w])
                                d[a[w]]++
                }
        }
        BEGIN {
                c = split(sw,a,"[|]")
        }
        END {
                for (i in a)
                {
                        o = o (a"="(d[a]?d[a]:0)",")
                }
                sub(",*$","",o)
                print o
        }
' /home/jahitt/data.txt
1 Like

Yoda's fix will give you a working program that counts the number of lines from line number 101 through line number 199 that contain "lemons" and that contain "dogs" and print them at the end. But, you didn't tell us what this script is supposed to do.

Another way to read what you were trying to do would be print lines 101 through 199 from your input file and at the end print the number of lines in the entire file that contaied "dogs" and the number of lines in the entire file that contained "lemons". If that was your intent, the one character change marked in red below to your original script should work:

awk -v sw="lemons|dogs" 'NR>100 && NR<200;BEGIN { c=split(sw,a,"[|]"); } { for (w in a) { if ($0 ~ a[w]) d[a[w]]++; } }
END { for (i in a) { o=o (a"="(d[a]?d[a]:0)","); }
  sub(",*$","",o); print o;
}' /home/jahitt/data.txt

Although I prefer more readable code like:

awk -v sw="lemons|dogs" '
NR>100 && NR<200
BEGIN { c=split(sw,a,"[|]")
}
{       for (w in a) {
                if ($0 ~ a[w])
                        d[a[w]]++
        }
}
END {   for (i in a) {
                o=o (a"="(d[a]?d[a]:0)",")
        }
        sub(",*$","",o)
        print o
}' /home/jahitt/data.txt

If your input file contained:

lemons and dogs
lemons only
cats and dogs
dogs only
cats only
lemons and cats and dogs

the above scripts produce:

dogs=4,lemons=3

but the output order is unspecified.

1 Like

your assumption is right on target! thank you.

i just thought of a different possibility, what happens if i want to exclude (for the string 'lemons') all lines that contain the word 'only'?

so in your output, if the lemon lines containing 'only' are excluded, then, the count should be:

dogs=4,lemons=2

The trivial way is to special case "lemons" and "only" by changing the line:

                if ($0 ~ a[w])

to:

                if ($0 ~ a[w] && (a[w] != "lemons" || $0 !~ "only"))

But, if you decide to change basic logic in your original requirements, you should consider whether you need to redesign everything so that each counted pattern has a list of zero or more exclusions that should be considered. (And, I'm not going to try to guess at your new requirements and propose a new syntax for your sw variable to make that happen.)

Quick prototyping works well sometimes. But, sitting down and clearly defining your requirements before you start programming will usually give you a much more coherent, maintainable piece of software that works better and does what you want.

In perl:

#!/usr/bin/perl -w

my (@lem_lines,@dogs_lines);

while(<>){
if (($. > 100) && ($. < 200)) {
push @lem_lines,$_ if /^lemons (?!only)/;
push @dogs_lines,$_ if /dogs/;
}
}
my $lem=scalar @lem_lines;
my $dogs=scalar @dogs_lines;
print "Number of lemons excluding ( lemons only ) in lines from 101 to 199 is : $lem\n";
print "count of dogs in lines from 101 to 199 is: $dogs\n";

Run as

perl try.pl input_file

I tried for the given input by Don Cragun and works well.
Thanks Don C for your explanation about the problem.

Ofcourse , regular expression has to be updated to look for dogs or lemons only etc if skysmart is looking for specific locations for example .