Script explanation

sickboy · June 11, 2005, 1:31pm

I have the following script

awk '$1 ~ /^[A-Z]*[0-9]+/ {
        s += $NF;
        m++
    }
    END {
        print NR, m, s
}

and I use it to get results from the following file

A4792 4
COMP9021 5
K9 7
ABC 8
924 1
R2D2 3
6
JQL-636 2

I was expecting to get as result 8 3 16
but ir gives me 9 6 26

Can somebody explain me why firstly NF gives 9 instead of 8 when there are 8 records in the file. What does the red + means and generally why I have these results.

Please help me, I am new in the scripting art and I want understand to go a step further. Cheers

reborg · June 11, 2005, 2:05pm

awk '$1 ~ /^[A-Z]*[0-9]+/ ... search for lines where the first field is either letter(s) followed by number(s) or only number(s)
s += $NF; ...add the value of the last field to variable s
m++ ... post increment counter m
}
END {
print NR, m, s ...print the number of fields, the value of the counter and the value of variable s (which is the sum of matching fields)
}

Why do you get 9 and not 8 records? Most likely beacuse the file ends with a blank line.

The + means one or more occurrences of the previous term in the regular expression.

sickboy · June 11, 2005, 2:16pm

Can you explain me which symbol implies "or only number" 'cause I think that drove me in the wrong results

A-Z ↩︎

reborg · June 11, 2005, 2:28pm

[A-Z]* , the * means zero or more occurrences.

sickboy · June 12, 2005, 3:40pm

Thanks a lot for your help

sickboy · June 13, 2005, 6:05am

Sorry but still I have a question.
With your explanation it shouldn't count the line with the empty field $1 (it was misstyped and couldn't been seen the <tab> so the line was <tab> 6) and the R2D2<tab>3. But it count's them. Why?

reborg · June 13, 2005, 7:44am

Becuase the search patter does not affect the number of records, it is only used to decide on which records to perform the actions. The number of records it the number of lines in the input file.

sickboy · June 13, 2005, 11:34am

You misunderstood me I mean about the m and s values

reborg · June 13, 2005, 2:25pm

Sorry, I did misunderstand.

Awk will concatenate field seperators, by default tab is a field seperator. Since tab is a field seperator and it is in the first position, it is silently ignored. $1 starts at the first non field seperator.