Hi,
I would like to know how can I get the number of rows in file1 that:
- the 1st and 2nd field should be the same (text)
- the 3rd field should be less or equal (numeric)
when comparing to file2.
So for each row of file1, I would like to have the number of rows in file2 that follow the above 2 rules. This is just a small example. In reality, my files have millions of rows and more columns (fields, tab separated)
Example of file1
A AB 3.7
B AB 2.5
Example of file2
A AB 3.5
A AB 3.7
Desired output file
A AB 3.7 2
B AB 2.5 0
Many thanks!
Could this help you ?
awk 'NR==FNR{a[$1$2]++;next} {if (a[$1$2]){print $0,a[$1$2]}else {print $0,"0"}}' File2 File1
It would we preferable to use some form of field separation to prevent unexpected mixing of the the two index fields.
awk -F '\t' '
NR==FNR{
A[$1 OFS $2]=$3
C[$1 OFS $2]=0
next
}
$3<=A[$1 OFS $2]{
C[$1 OFS $2]++
}
END{
for(i in A)print i,A,C
}
' file1 file2
awk 'BEGIN{SUBSEP=FS} FNR==NR{a[$1,$2]=$3;next} {
for(i in a){
if($1SUBSEP$2==i) {
if($3<=a) {
b++;break}}}
} END{for(i in a) printf("%s %s %d\n",i,a,b)}' file1 file2
Hi Scrut,
Can you please explain that in for loop what is the value of variable i?
As there are two array:
A[A AB]=3.7 etc
C[A AB]=0
Now here index is field 1 and 2.
When printing in for loop how system decide the value of var i whether it should be variable or string [A AB].
Please explain?
Hi, the content of the i
variable in the loop is determined by for(i in A)
, which enumerates the indexes... I am not sure if I understood your question correctly...