Gents
I have this input file
file1 (uniq records)
54503207851 170211240
54503207911 170210837
54503208111 170215105
54503208112 170215210
54655210011 170223140
54655210091 170223738
54655210172 170224355
54655210251 170224741
54655210331 170225039
54655210411 170225505
54655210492 170230224
54655210571 170230632
54655210651 170231004
54655210731 170231601
file2 (duplicate records)
54503207851 1 1 11 2600 54503207851 170211240
54503207911 1 1 7 2600 54503207911 170210837
54503208111 1 7 1301482 2600 54503208111 170215105
54503208112 2 1 8 2600 54503208112 170215210
54655210011 1 1 1 2597 54655210011 170223140
54655210091 1 1 4 2597 54655210091 170223738
54655210172 2 1 2 2597 54655210172 170224355
54655210251 1 1 2 2597 54655210251 170224748
54655210251 1 1 2 2597 54655210251 170224742
54655210331 1 1 1 2597 54655210331 170225040
54655210411 1 1 1 2597 54655210411 170225505
54655210492 2 1 46 2597 54655210492 170230024
54655210492 2 1 1 2597 54655210492 170230225
54655210571 1 1 3 2597 54655210571 170230632
54655210651 1 1 1 2597 54655210651 170231004
54655210731 1 1 2 2597 54655210731 170231601
desired output
54503207851 170211240 1 1 11 2600 54503207851 170211240
54503207911 170210837 1 1 7 2600 54503207911 170210837
54503208111 170215105 1 7 1301482 2600 54503208111 170215105
54503208112 170215210 2 1 8 2600 54503208112 170215210
54655210011 170223140 1 1 1 2597 54655210011 170223140
54655210091 170223738 1 1 4 2597 54655210091 170223738
54655210172 170224355 2 1 2 2597 54655210172 170224355
54655210251 170224741 1 1 2 2597 54655210251 170224742
54655210331 170225039 1 1 1 2597 54655210331 170225040
54655210411 170225505 1 1 1 2597 54655210411 170225505
54655210492 170230224 2 1 1 2597 54655210492 170230225
54655210571 170230632 1 1 3 2597 54655210571 170230632
54655210651 170231004 1 1 1 2597 54655210651 170231004
54655210731 170231601 1 1 2 2597 54655210731 170231601
i am using the following code to get the output
awk '{x=substr($0,1,22)} FNR==NR{a[substr($0,1,11)]=$0;next} {print x, a[substr($0,1,11)]? OFS a[substr($0,1,11)] : OFS "9999" }' file2 file1
The point is to match the column 1 in both files but to grep from file2 the nearest value comparing columns 2 in file1 and 7 in file2..
With my code I got any of them,, the purpose is to get the close value matching column 1 and comparing last column.
Thanks for your help
RudiC
June 22, 2015, 4:41pm
2
Nearest to what? Sorry to be that thick.
Hi RudiC
I say close value to column 2 in file1 comparing with value column 7 in file2 the last 6 dig correspond to time, Then as the duplicate value in column 1 file2 has 2 different time i would like to Keep the time which is more close to time in file 1.
Hope u can help me.
You could try this (assumption is that file2 is sorted, as in your demo data):
awk '
FNR==NR{a[$1]=$2;next}
key && key != $1 { best=key=x; print ln }
$1 in a {
if(a[$1]<$7) diff=$7-a[$1]
else diff=a[$1]-$7
if(!diff) {
print
key=best=x
delete a[$1]
} else if(!best || diff<best) {
key=$1
best=diff
ln=$0
}
}
END { if(key) print ln } ' file1 file2
1 Like
Hi Xl
Thanks for your answer I will check it and I let you know
---------- Post updated at 08:28 AM ---------- Previous update was at 03:04 AM ----------
Hi XL
it works perfect, thanks a lot
but there is something that is not correct, per example if in the file2 there is 2 duplicate values and one of them have the same time of file1, so the script should keep the time like file1..
other thing please, how to ajust the code if we have also duplicate values in file1 column 1..
54503207851 170211240
54503207911 170210837
54503208111 170215105
54503208112 170215210
54655210011 170223140
54655210091 170223738
54655210172 170224355
54655210251 170224741
54655210251 170224741
54655210331 170225039
54655210411 170225505
54655210492 170230224
54655210492 170230224
54655210571 170230632
54655210651 170231004
54655210731 170231601
,
Then the time from file2 should adapt to the 2 duplicate values in file1.
Regards...
RudiC
June 23, 2015, 10:44am
6
Your desired output from post#1 is matched neither by your own code snippet's result nor by Chubler_XL's. To introduce file1's second column as column 2 in the output, try
awk '
FNR==NR {a[$1]=$2
next
}
key != $1 &&
key {best=key=x
print ln
}
$1 in a {T=$1
if ((diff=$7-a[T]) < 0) diff=-diff
sub ($1, $1 OFS a[$1])
if (!diff) {print
key=best=x
# delete a[$1]
}
else if(!best || diff<best) {key=$1
best=diff
ln=$0
}
}
END {if (key) print ln }
' file1 file2
54503207851 170211240 1 1 11 2600 54503207851 170211240
54503207911 170210837 1 1 7 2600 54503207911 170210837
54503208111 170215105 1 7 1301482 2600 54503208111 170215105
54503208112 170215210 2 1 8 2600 54503208112 170215210
54655210011 170223140 1 1 1 2597 54655210011 170223140
54655210091 170223738 1 1 4 2597 54655210091 170223738
54655210172 170224355 2 1 2 2597 54655210172 170224355
54655210251 170224741 1 1 2 2597 54655210251 170224742
54655210331 170225039 1 1 1 2597 54655210331 170225040
(Chubler_XL's code adapted)
To your last question: If in file1 there's exact duplicates, nothing will happen except that a[$1]
will be overwritten with the identical value. If the values are not identical, the last one will apply.
Dear Rudi..
Thanks a lot
Just i was going to say something else
I get something like this:
file1
54655210492 170230224
file2
54655210492 2 1 1 2597 54655210492170230225 170230225
54655210492 2 1 46 2597 54655210492170230024 170230024
output
54655210492 2 1 1 2597 54655210492170230225 170230225
output desired
54655210492 2 1 46 2597 54655210492170230024 170230024
I will try your script, and check if give me the same result..
Please try to help me
---------- Post updated at 09:56 AM ---------- Previous update was at 09:47 AM ----------
Dear RudiC,
Just I try your code and it gives the same output.
Please try to help me to solve it..
As I say previously it needs to grep the rows where the time is exactly or the more closets.
RudiC
June 23, 2015, 12:04pm
8
From my point of view, 170230224 is closer to 170230225 than to 170230024. So - where's the problem?
1 Like
Dear RudiC
Sorry it is my mistake.
I am getting crazy
Works Fine
RudiC or XL
Can you please modify your script to be able to use it with csh instead of bash...
I want to merge it with other script I have in csh.
Many thanks
RudiC
June 24, 2015, 4:09pm
11
Neither script has anything to do with bash
nor csh
. Both are pure awk
.
1 Like
Dear RudiC,
When i run it using
#!/bin/bash
it works fine..
but when i put it inside of the csh
script
give a error
Unmatched '
Why this error.
RudiC
June 24, 2015, 4:17pm
13
Because, I'd guess, a '
is missing. But, I can't help with csh
.
1 Like
Probably be better of avoiding having the awk program on the command line try
putting the program in a file like match_val.awk
and change your command line to awk -f match_val.awk file1 file2
match_val.awk should contain the code you have between '
and '
eg:
FNR==NR {a[$1]=$2
next
}
...
END { if(key) print ln }
1 Like