Grepping for one variable while using awk to parse an associated variable

ncwxpanther · January 3, 2020, 4:08pm

Im trying to search for a single variable in the first field and from that output use awk to extract out the lines that contain a value less than a value stored in another variable. Both the variables are associated with each other.

Any guidance is appreciated.

File that contains the variables
variables.txt

file-to-search.txt

01001 02  1900 65.6
01001 02  1901 60.6
01001 02  1902 62.6
....
01003  02  1900 65.1
01003  02  1901 65.3
01003  02  1902 67.3
....
01005  02  1900 64.7
01005  02  1901 65.2
01005  02  1902 65.2
...
01007  02  1900 64.8
01007  02  1901 63.8
01007  02  1902 62.7
....

I was using a for loop but realized that the resulting output was not correct.

county=$(awk '{ print $1 }' variables.txt)
val=$(awk '{ print $2 }' variables.txt)

for id in $county
do
grep "$id" file-to-search.txt | awk '$4 < $val { print $0 }' | tail -1 | awk '{print $1" " $3" " $4}'
done

What im hoping for is a file that looks like this

01001 1902 62.6
01003 1901 65.3
01005 1900 64.7
01007 1902 62.7

vgersh99 · January 3, 2020, 4:30pm

Cannot really correlate your explanation with your sample output, but.... to start with:

awk '
FNR==NR { f1[$1]=$2; next}
$1 in f1 && $4 < f1[$1]' variables.txt file-to-search.txt

RudiC · January 3, 2020, 4:49pm

Just guessing - do you want to print the line with the greatest $4 value less than the one in variables.txt (why else would 01001 02 1901 60.6 and 01003 02 1900 65.1 be missing in your desired output)? Try

sort -k1,1 -k4,4r file2 |
awk '
FNR==NR         {T[$1] = $2
                 next
                }
($1 in T) &&
($4 <= T[$1])   {print $1, $3, $4
                 delete T[$1]
                }
' file1 -
01001 1902 62.6
01003 1901 65.3
01005 1900 64.7
01007 1902 62.7

If that's not the case, PLEASE become way clearer and more precise in your specifications!

ncwxpanther · January 3, 2020, 8:28pm

For each ID in column 1 of the avgs.txt file, I need to match id's to find the lines in file-to-search.txt whose value in column 4 is the next lowest to the value in column 2 of the avgs.txt file when sorted by column 3 (column 3 contains the year and I need to know the closest year, to 2019, whose value in column 4 contains the next lowest value). I need to print all columns and the output is only the line that contains the next lowest value (in column 4) when compared to the first file (column 2) assuming the year is sorted properly.

Aia · January 4, 2020, 2:26am

cat p.py
#!/usr/bin/env python3

data = {}
with open("file-to-search.txt") as f2s:
    for line in f2s:
        fields = line.split()
        fields.pop(1)

        if fields[0] in data:
            data[fields[0]].update({fields[2]: " ".join(fields)})
        else:
            data[fields[0]] = {fields[2]: " ".join(fields)}

points = {}
for entry in data:
    scores = []
    for s in data[entry]:
        scores.append(s)
    scores.sort(reverse=True)
    points[entry] = scores

with open("variable.txt") as s2f:
    for line in s2f:
        target, score = line.split()
        for s in points[target]:
            if score > s:
                print(data[target])
                break

Run:

python3 p.py
01001 1902 62.6
01003 1901 65.3
01005 1900 64.7
01007 1902 62.7

ncwxpanther · January 5, 2020, 8:54am

Rudi

Id like to print the greatest $4 value less than the one in variables.txt but searching from the 2019 value in $3 in decreasing order, i.e., 2019, 2018, 2017...

ncwxpanther · January 5, 2020, 10:51am

I found a solution that works for me. In my previous attempts I was not using an awk variable with -v. Thanks.

for id in {`cat counties.txt`}; do grep $id county-annual-average.txt > $id.txt; val=$(grep $id avgs.txt | awk '{print $2 }'); awk -v val="$val" '$4 < val' ${id}.txt | tail -1 | awk '{print $1" " $3" "$4}'; done