Compare fields in files

Hi,

I need the most efficient way of comparing the following and arriving at the result

I have a file which has entries like,

File1:

1|2|5|7|8|2|3|6|3|1

File2:

1|2|3|1|2|7|9|2

I need to compare the entries in these two file with those of a general file,

1|2|3|5|2|5|6|9|3|1 ---> constant file

My need is to calculate the similar entries between file1 and constant file and the same for file2 and so on.
Only matching.

In my case,
the compare with file1 and constant file will account to: "2"
similarly for file2 it will be: "4"

Please help me to implement this in an easy and effective way.

thanks.

Why this result?

Regards,
Birei

The result seems to be the number of fields (pipe separated) that match the constant file. The question I have is: is there just one record in the constant file, or multiple records?

Assuming agama's assumption is correct.
The constant file is like this

1|2|4|6|5|7|2|2|9|8

Here is the code:

paste constant file1 file2 |
awk '
BEGIN {
        FS="\t"
        sep="|"
}
{
        nc=split($1, c, sep)
        nf1=split($2, f1, sep)
        nf2=split($3, f2, sep)

        k=0
        for (i=1; i<=nf1; ++i ) {
                if ( f1 == c ) { k+=1 }
        }
        print "File1:", k

        k=0
        for (i=1; i<=nf2; ++i ) {
                if ( f2 == c ) { k+=1 }
        }
        print "File2:", k

}'

Is there multiple files like file1,2,3 and so on?
Is there a way to do this in python.

Thanks in advance,
Uma

Assuming - as per your example - that all your files contain 1 line :

$ cat f1
1|2|5|7|8|2|3|6|3|1
$ cat f2
1|2|3|1|2|7|9|6
$ cat f3
1|0|5|0|8|2|1|1|1
$ nawk -F\| 'NR==FNR{n=NF;split($0,A,"\|");next}FNR==1{d=0}{for(i=0;++i<=n;)if($i==A) ++d;print d}' f*
3
4
$ nawk -F\| 'NR==FNR{n=NF;split($0,A,"\|");next}FNR==1{d=0}{for(i=0;++i<=n;)if($i==A) ++d;print d}' f1 f2 f3
3
4
$
1 Like

Is f1 - constant here?

I think the need is to compare f1 with f2,f3,f4 and so on.. and get the matched count in R2,R3 and so on.

f1 -> 1|2|3|4
f2 -> 1|2|3|1
f3 -> 1|3|3|2

so the result must be,

R2 -> 3 (as 1,2, and 3 matches)
R4 -> 2 (as 1 and 3 matches)

Am i correct here!

nawk -F\| 'NR==FNR{n=NF;split($0,A,"\|");next}FNR==1{d=0}{for(i=0;++i<=n;)if($i==A) ++d;print FILENAME ":" d}' f1 f2 f3
2 Likes