Script explanation

I have the following script

awk '$1 ~ /^[A-Z]*[0-9]+/ {
        s += $NF;
        m++
    }
    END {
        print NR, m, s
}

and I use it to get results from the following file

A4792 4
COMP9021 5
K9 7
ABC 8
924 1
R2D2 3
6
JQL-636 2

I was expecting to get as result 8 3 16
but ir gives me 9 6 26

Can somebody explain me why firstly NF gives 9 instead of 8 when there are 8 records in the file. What does the red + means and generally why I have these results.

Please help me, I am new in the scripting art and I want understand to go a step further. Cheers

awk '$1 ~ /^[A-Z]*[0-9]+/ ... search for lines where the first field is either letter(s) followed by number(s) or only number(s)
s += $NF; ...add the value of the last field to variable s
m++ ... post increment counter m
}
END {
print NR, m, s ...print the number of fields, the value of the counter and the value of variable s (which is the sum of matching fields)
}

Why do you get 9 and not 8 records? Most likely beacuse the file ends with a blank line.

The + means one or more occurrences of the previous term in the regular expression.

Can you explain me which symbol implies "or only number" 'cause I think that drove me in the wrong results


  1. A-Z ↩︎

[A-Z]* , the * means zero or more occurrences.

Thanks a lot for your help

Sorry but still I have a question.
With your explanation it shouldn't count the line with the empty field $1 (it was misstyped and couldn't been seen the <tab> so the line was <tab> 6) and the R2D2<tab>3. But it count's them. Why?

Becuase the search patter does not affect the number of records, it is only used to decide on which records to perform the actions. The number of records it the number of lines in the input file.

You misunderstood me I mean about the m and s values

Sorry, I did misunderstand.

Awk will concatenate field seperators, by default tab is a field seperator. Since tab is a field seperator and it is in the first position, it is silently ignored. $1 starts at the first non field seperator.