When curly braces needed?

Hello, i was trying to find get a command to list duplicated files so i tried

ls dir1 dir2 | awk '{x[$0]++}'

and it didnt work.

After a bit of searching online i found that it works without the curly braces

ls dir1 dir2 | awk 'x[$0]++'  

I thought the curly braces were needed in awk so dont understand why it only works without them.

Can anyone explain?

Thanks

Curly braces in awk enclose the action part of the pattern {action} pair. As you don't supply any pattern, the result is FALSE and nothing is acted. Without the braces, it's a pattern which on second encounter is TRUE, so the default action ( print $0 ) is executed.

If you do not supply a pattern the DEFAULT pattern is TRUE (or match every line)

'{x[$0]++}'

The action is performed, however, that action is not a program that would display any of it.
You can see the result after awk has read every record by:

ls dir1 dir2 | awk '{x[$0]++}; END{ for (i in x){print i, " seen ", x, " time(s)"}}'
'x[$0]++'

without braquets is a pattern. Actually, x[$0] is the pattern, the ++ is to be added after the effect of evaluating if x[$0] contains a value. If there is a value, then, evaluates to TRUE and the default action, if not explicitly declared, would be to print $0 .

To summarize the principle:

The invocation of AWK always means (whether, implicitly or explicitly) the following:

awk pattern {action}

You can omit one, either pattern or action, but you can not omit both.
If you omit the pattern, the action is performed on each line read.
If you omit the action, print $0 is the action for every time that the pattern evaluates no zero or no empty (FALSE)

4 Likes

Aia, your explanation is correct, the {action} is performed but without printing!
Rather than the lazy

ls dir1 dir2 | awk 'x[$0]++'

I suggest

ls dir1 dir2 | awk '{if ($0 in x) print; else x[$0]}'

that shows that the action needs an explicit print (the print is short for print $0 ). Also it does not assign values to the x array (saves some memory); the lookup ($0 in x) only needs the keys not the values.

And, of course, the single awk statement in:

ls dir1 dir2 | awk '{if ($0 in x) print; else x[$0]}'

can be split into two awk statements (one with a default action and one with a default pattern) producing the same output with less text:

ls dir1 dir2 | awk '$0 in x; {x[$0]}'

But, of course, if instead of printing each line that has appeared more than once you just want to print each line (once) that appears two or more times, you would need to keep a count, but the code is still short:

ls dir1 dir2 dir3 ...| awk 'x[$0]++ == 1'

... should have thought before posting - I'm using the TRUE for missing pattern every day. Sorry for that.

It's not that awk demands { } sometimes and doesn't demand them other times, it's that they mean different things.

awk's syntax can be difficult to grasp at first because there's a lot assumed. When you put a single expression on a line by itself without braces, i.e. X[$0]++ what awk understands that as, is
if(X[$0]++) then print_entire_line

When you use braces, you can put an expression optionally before and after: EXPR1 { STATEMENTS } EXPR2 which turns out to mean:

if(EXPR1)then STATEMENTS
if(EXPR2) then print_entire_line

EXPR1 is assumed to be always true when omitted, EXPR2 is assumed to be always false when omitted.