i'm a beginner in writing awk scripts and I have a problem with reading input files.
Requirement for my programm:
compare file1 to file2 and check if value in column1 is equal and value in column5 is different.
File 1:
180 P 01.01.2008 30.06.2008 2
180 P 01.07.2008 30.09.2008 1
180 P 01.10.2008 31.12.2008 1
1101 P 01.01.2008 30.11.2008 1
1101 P 01.12.2008 31.12.2008 1
1101 P 01.12.2008 31.12.2008 2
1101 P 01.12.2008 31.12.2008 3
File 2:
180 P 01.01.2008 30.06.2008 1
180 P 01.07.2008 30.09.2008 1
180 P 01.10.2008 31.12.2008 1
1101 P 01.01.2008 30.11.2008 1
1101 P 01.12.2008 31.12.2008 2
My Code:
BEGIN {
SUBSEP=" "
if (ARGC < 3) {
print "gawk -f L16.awk ZZT_T5A71.txt T5A71.txt"
exit
} else {
t5a71 = ARGV[2]
zzt_t5a71 = ARGV[1]
}
}
{#MAIN
if (FILENAME == zzt_t5a71) {
split($0, record2, SUBSEP)
}
if (FILENAME == t5a71) {
split($0, record1, SUBSEP)
Pernr = match(record1[1],record2[1])
if (Pernr != 0) {
Zeit = match(record1[3], record2[3])
if (Zeit != 0){
if (record1[5] > record2[5]){
arrGES[FNR] = $0
}
}
}
}
}
END{
for (x in arrGES)
print arrGES[x]
}
My output is just
1101 P 01.12.2008 31.12.2008 3
and not
180 P 01.01.2008 30.06.2008 2
1101 P 01.12.2008 31.12.2008 3
NR<=FNR that's a condition. If the total number of records processed so far is less or equal to the record number in the current file. In other words, all records from first file (file2).
{_f1[$1 $5]=1;next} if condition is met, we fill an associative array with, as index, a concatenation of field 1 and 5 from file2 and give it a value of 1 (true). When its done we skip to next line in the same file without executing the remaining awk instructions. You can see this as a loop on first file. At the end of the file, the condition above will be false and awk will continue on the second instruction bloc with the first record of second file (file1 in this case).
!_f1[$1 $5]' is short hand for !_f1[$1 $5]{print} again, <condition>{action} if the concatenation of field 1 and 5 from second file (file1) was seen in first file (file2), the array _f1 will have a value 1 (true) hence skip record. Otherwise, print.
Thanks for the explanation of the code.
The output of the programm should be those lines where column5 is different. With your code I get file1 as output.