awk logic

I am trying to check my logic on a long awk i'm using. I have about 30 checks that I built into an awk and I "believe" I did this right, but I could be wrong.

awk -F\| '
$9 !~ /\/*[0-9]{1,}*/
$9 ~ /\([A-Za-z]-[a-zA-Z0-9]{4}, [a-zA-Z0-9]{2,3}/
$9 ~ /\([A-Za-z0-9]{6}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\(\+[A-Za-z0-9]{5}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\(\+\+[A-Za-z0-9]{4}, [a-zA-Z0-9]{2,3}\)/
$9 ~ /\([A-Za-z0-9]{4}\+[A-Za-z0-9], [a-zA-Z0-9]{2,3}\)/
' file1.csv >> file2.csv

This is obviously just a subset to show the idea. My question is, when awk reads through my file, will it bail after the first match or will it continue down through each of my checks? Thanks!!

It will run through all your checks, potentially printing the same line several times, and the first condition with the "not" will not stop it.

If you want this to be an or-condition, you could use an or-condition, like

($9 ~ /regex1) || ($9 ~ /regex2/) || ($9 ~ /regex3/)

to check multiple conditions but print only once.

How about this?

$9 !~ /\/*[0-9]{1,}*/ { next } # Skip lines containing  /91...
($9 ~ /regex1/) || ($9 ~ /regex2) ...' inputfile > outputfile
1 Like

So if I understand you correctly, if I want it to bail, I can just add

 {next} 

to the end of each search string?

By default, if you don't put { ... } after an expression, it assumes it should do { print }.

If you do { next } instead, it will skip to the next line without printing and start checking your expressions from the beginning instead.

1 Like

Ok. Now what you said makes sense. I can skip the checks on records I don't want and then run all my checks after. Is there a limit on how many "||" I can do? Like I said, I have about 40 checks i'm doing per line.

Thank you so much for your helpful insights.

Using or in regex

$9~/regex1|regex2|regex3|.../
1 Like

Not really, but when you start piling them on like that it may be time to rethink your logic.

I forgot that awk supports | inside one regex like Jotne suggests, that will help.

I forgot that one too. The reason I can't change my logic is because i'm dealing with human inputs and for some reason people can't seem to follow simple guidelines so I have to go search for all the weird things that they do. Thanks to both of you, you got me back on the right track.

And, on top of what Jotne said, try subgroups. I can see that all regexes end with the same two or three char pattern...