awk read input files

tgooper · September 9, 2009, 5:51am

hi,

i'm a beginner in writing awk scripts and I have a problem with reading input files.
Requirement for my programm:
compare file1 to file2 and check if value in column1 is equal and value in column5 is different.
File 1:

180    P    01.01.2008    30.06.2008    2
180    P    01.07.2008    30.09.2008    1
180    P    01.10.2008    31.12.2008    1
1101    P    01.01.2008    30.11.2008    1
1101    P    01.12.2008    31.12.2008    1
1101    P    01.12.2008    31.12.2008    2
1101    P    01.12.2008    31.12.2008    3

File 2:

180    P    01.01.2008    30.06.2008    1
180    P    01.07.2008    30.09.2008    1
180    P    01.10.2008    31.12.2008    1
1101    P    01.01.2008    30.11.2008    1
1101    P    01.12.2008    31.12.2008    2

My Code:

BEGIN {
        SUBSEP=" "
        if (ARGC < 3) {
             print "gawk -f L16.awk ZZT_T5A71.txt T5A71.txt"
            exit
        } else {
            t5a71 = ARGV[2]
            zzt_t5a71 = ARGV[1]
        }
    
    }
{#MAIN
    if (FILENAME == zzt_t5a71) {
       split($0, record2, SUBSEP)    

    }
    if (FILENAME == t5a71) {
       
        split($0, record1, SUBSEP)
        
        Pernr = match(record1[1],record2[1])
        if (Pernr != 0) {
      
            Zeit = match(record1[3], record2[3])
            if (Zeit != 0){
              
                if (record1[5] > record2[5]){
                        arrGES[FNR] = $0
                    
                }
            }
        }
    }
}
END{
    for (x in arrGES)
        print arrGES[x]
    
}

My output is just
1101 P 01.12.2008 31.12.2008 3

and not
180 P 01.01.2008 30.06.2008 2
1101 P 01.12.2008 31.12.2008 3

Why??

ripat · September 9, 2009, 6:37am

Hello,

Try this:

awk 'NR<=FNR{_f1[$1 $5]=1;next}!_f1[$1 $5]' file2 file1

tgooper · September 9, 2009, 10:59am

Thanks for the fast answer. I didn't expect that you can write this programm in one codeline. Can somebody explain me this codeline

awk 'NR<=FNR{_f1[$1 $5]=1;next}!_f1[$1 $5]' file2 file1

Thanx
Tgooper

ripat · September 10, 2009, 3:32am

awk is known for being terse.

NR<=FNR that's a condition. If the total number of records processed so far is less or equal to the record number in the current file. In other words, all records from first file (file2).

{_f1[$1 $5]=1;next} if condition is met, we fill an associative array with, as index, a concatenation of field 1 and 5 from file2 and give it a value of 1 (true). When its done we skip to next line in the same file without executing the remaining awk instructions. You can see this as a loop on first file. At the end of the file, the condition above will be false and awk will continue on the second instruction bloc with the first record of second file (file1 in this case).

!_f1[$1 $5]' is short hand for !_f1[$1 $5]{print} again, <condition>{action} if the concatenation of field 1 and 5 from second file (file1) was seen in first file (file2), the array _f1 will have a value 1 (true) hence skip record. Otherwise, print.

tgooper · September 10, 2009, 8:29am

Thanks for the explanation of the code.
The output of the programm should be those lines where column5 is different. With your code I get file1 as output.

panyam · September 10, 2009, 8:46am

TEXTBOX>awk 'NR<=FNR{_f1[$1 $5]=1;next}!_f1[$1 $5]' file2 file1
180    P    01.01.2008    30.06.2008    2
1101    P    01.12.2008    31.12.2008    3

Don't know what Your expecting ?.

tgooper · September 10, 2009, 8:51am

Sorry had the wrong file for my test!

danmero · September 10, 2009, 8:56am

# awk 'NR==FNR{_[$1$5]++}!_[$1$5]' file2 file1
180    P    01.01.2008    30.06.2008    2
1101    P    01.12.2008    31.12.2008    3

Use GNU awk (gawk), New awk (nawk) or POSIX awk (/usr/xpg4/bin/awk).

tgooper · September 10, 2009, 9:52am

I've got one more question:
How can I covert this awk code into a skrip which I can start like

gawk -f skript.awk file2 file1 outputfile

danmero · September 10, 2009, 11:13am

# cat skript.awk
NR==FNR{_[$1$5]++}!_[$1$5]

# gawk -f skript.awk file2 file1 > file3

# cat file3
180    P    01.01.2008    30.06.2008    2
1101    P    01.12.2008    31.12.2008    3

tgooper · October 30, 2009, 8:58am

hi,

I want to include a check if the column5 in file2 is bigger than column5 in file1
My Code

awk 'NR<=FNR{_f1[$1 $5]=1;next} f1[$1 $5] > f1[$1 $5]' test_z.txt test_t.txt

It's not working Why?

Thanks
Tgooper