awk logic

dagamier · August 27, 2013, 2:43pm

I am trying to check my logic on a long awk i'm using. I have about 30 checks that I built into an awk and I "believe" I did this right, but I could be wrong.

awk -F\| '
$9 !~ /\/*[0-9]{1,}*/
$9 ~ /\([A-Za-z]-[a-zA-Z0-9]{4}, [a-zA-Z0-9]{2,3}/
$9 ~ /\([A-Za-z0-9]{6}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\(\+[A-Za-z0-9]{5}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\(\+\+[A-Za-z0-9]{4}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\([A-Za-z0-9]{4}\+[A-Za-z0-9], [a-zA-Z0-9]{2,3}\)/
' file1.csv >> file2.csv

This is obviously just a subset to show the idea. My question is, when awk reads through my file, will it bail after the first match or will it continue down through each of my checks? Thanks!!

Corona688 · August 27, 2013, 2:48pm

It will run through all your checks, potentially printing the same line several times, and the first condition with the "not" will not stop it.

If you want this to be an or-condition, you could use an or-condition, like

($9 ~ /regex1) || ($9 ~ /regex2/) || ($9 ~ /regex3/)

to check multiple conditions but print only once.

How about this?

$9 !~ /\/*[0-9]{1,}*/ { next } # Skip lines containing  /91...
($9 ~ /regex1/) || ($9 ~ /regex2) ...' inputfile > outputfile

dagamier · August 27, 2013, 2:54pm

So if I understand you correctly, if I want it to bail, I can just add

 {next}

to the end of each search string?

Corona688 · August 27, 2013, 2:59pm

By default, if you don't put { ... } after an expression, it assumes it should do { print }.

If you do { next } instead, it will skip to the next line without printing and start checking your expressions from the beginning instead.

dagamier · August 27, 2013, 3:02pm

Ok. Now what you said makes sense. I can skip the checks on records I don't want and then run all my checks after. Is there a limit on how many "||" I can do? Like I said, I have about 40 checks i'm doing per line.

Thank you so much for your helpful insights.

Jotne · August 27, 2013, 3:06pm

Using or in regex

$9~/regex1|regex2|regex3|.../

Corona688 · August 27, 2013, 3:10pm

Not really, but when you start piling them on like that it may be time to rethink your logic.

I forgot that awk supports | inside one regex like Jotne suggests, that will help.

dagamier · August 27, 2013, 5:11pm

I forgot that one too. The reason I can't change my logic is because i'm dealing with human inputs and for some reason people can't seem to follow simple guidelines so I have to go search for all the weird things that they do. Thanks to both of you, you got me back on the right track.

RudiC · August 27, 2013, 5:14pm

And, on top of what Jotne said, try subgroups. I can see that all regexes end with the same two or three char pattern...