Grepping file and returning passed variable if the value does not exist in file at all.

personalt · April 26, 2011, 10:53am

I have a list of fields that I want to check a file for, returning that field if it not found at all in the file. Is there a way to do a grep -lc and return the passed variable too rather then just the count?

I am doing some crappy work-around now but I was not sure how to regrep this for :0 so that it only displays the fields not in the file at all.

there must be a simpler way... :wall:

    a=OPEN
    echo "$a:\c"
    grep -lc "'$a'" /data/data.xml
    a=CLOSE
    echo "$a:\c"
    grep -lc "'$a'" /data/data.xml

cambridge · April 26, 2011, 11:01am

Well you could do something like this:

awk '/OPEN:/ {o++} /CLOSE:/ {c++} END {print "OPEN:" o "\nCLOSE:" c}' infile

Depends how many different fields you want to check for. Also, if we knew your exact file format it might be easier to give you something more optimal...

personalt · April 26, 2011, 12:17pm

Thank you very much...

I actually have a few hundred fields to check. Since I am scripting the scripts I ended up converting this to following. I also made change to only search for instances when my field is wrapped in single quotes. And then to re-grep to only print instances where the field was not found at least once.

I would assume it would be more efficient to open the file once but I only need to run this script once a day prior to file check in and for my source file it runs in 15 seconds. The infile format is a pseudo xml file.

awk '/'\''OPEN'\''/ {o++} END {print "OPEN:" o }' infile | grep -v '[0-9]'
awk '/'\''CLOSE'\''/ {o++} END {print "CLOSE:" o }' infile | grep -v '[0-9]'
awk '/'\''HIGH'\''/ {o++} END {print "HIGH:" o }' infile | grep -v '[0-9]'
..........
..........
.........

cambridge · April 28, 2011, 3:24am

Your solution can be optimised significantly. You don't need to invoke AWK more than once, and piping the output of AWK to grep is a faux paus I witness far too many times ... it's simply unnecessary.

The following will achieve the same result and will be much faster:

awk -v sq="'" -v fields="OPEN CLOSE HIGH" '
    BEGIN { fn=split(fields, fs) }
    { for (i=1; i<=fn; i++) if ($0 ~ sq fs sq) fc++ }
    END { for (i=1; i<=fn; i++) print fs ":" (fc ? fc : 0) }' infile

The fields are passed in via the fields variable and are separated by whitespace (spaces, tabs, newlines). I don't know how you're getting your list of a few hundred fields, if they're in a file already separated by spaces or newlines then you could do:

-v fields="$(<fieldfile)"

If you only want a list of fields where the count is zero (which is what I believe your intent was behind using grep in your earlier example), then amend the last line as follows:

END { for (i=1; i<=fn; i++) if (!fc) print fs}' infile

Don't forget if you're on Solaris to use /bin/nawk or /usr/xpg4/bin/awk instead of /bin/awk.

Best regards,
Mark.