Compare multiple fields in file1 to file2 and print line and next line

gillesc_mac · March 13, 2009, 2:12pm

Hello,

I have two files that I need to compare and print out the line from file2 that has the first 6 fields matching the first 6 fields in file1. Complicating this are the following restrictions

file1 is only a few thousand lines at most and file2 is greater than 2 million
I need to match the first 6 fields (in order) of each line in file1 to the first 6 fields (in order) in a line in file2 and print the matched line from file2 along with the next line in file2.

Example files

file1:

...
0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (line 364)
0.6 4.0 3.99 2.0 0.85 7.0 3.84 0.05 (line 365)
...

file2:

93 28 04 73 95 11 0.4 7.9 2.30 4.05 (100(f18.3)) (line 30046)
70.1 99.4 0.35 9.943 6.1 0.27 0.654 (line 30047)
0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (54(f18.3) (line 628450)
44.8 33.2 90.3 45.2 66.3 (line 628451)

Needed result matches line 364 from file1 to line 628450 from file2 and prints lines 628450 and 628451, then goes to line 365 of file1 and searches file2 for a match to print matching first line and necessary second line from file2

Example partial output matching file1 with file2

0.54 3.2 0.45 32.9 4 0.02 9.0 4.0 (54(f18.3)
44.8 33.2 90.3 45.2 66.3

I don't really care what I use, awk, sed, perl, etc. I just need it to work.

Hopefully this make sense.

Thanks

Chris

cfajohnson · March 13, 2009, 2:44pm

You really need GNU grep for this.

Put the fields you want to search for from file1 in another file, and use the -f and -A options to grep:

cut -d ' ' -c1-6 > file3
grep -f file3 -A1 file2

vgersh99 · March 13, 2009, 3:08pm

something along these lines.

nawk -f gil.awk file1 file2

gil.awk:

function buildIDX(   i, idx) {
    for(i=1; i<=6;i++) idx=(i==1) ? $i : idx SUBSEP $i
    return idx
}
FNR==NR {
    f1[buildIDX()]
    next
}
found && found--
{
   if (buildIDX() in f1) {
      print
      found=1
   }
}

gillesc_mac · March 13, 2009, 4:21pm

Thank you, that was helpful...

Now I have another somewhat similar scenario

I have file1 with a field 8 that I need to match to field 1 in file2 and print the file2 line along with the next line in file2, so I was thinking of generating a file that contained the matched file2 line then doing the grep recommendation above to get both lines from file2.

I am unsure how to compare different fields in different files (note these are floating point numbers not necessarily the same string values but same numerical values, i.e. 8.54 for file1 and 8.54000 for file2)

Thanks again

vgersh99 · March 13, 2009, 4:35pm

Assuming the floating point precision is 2 - not tested:

nawk 'FNR==NR { f1[$8]; next } sprintf("%.2f", $1) in f1' file1 file2

gillesc_mac · March 13, 2009, 6:00pm

Thank you again, but I neglected to remember another restriction. I need to match multiple fields for example

File1 File2
$9 = $1
$1 = $3
$2 = $4
$3 = $5
$4 = $6
$5 = $7
$6 = $8
$7 = $9

But again each field is not necessarily the same precision. I tried adding additions to your script but I am just beginning to learn.

Thank you

vgersh99 · March 13, 2009, 6:16pm

how many requirements DO you have?
Not tested.

BEGIN {
   fld1="9 1 2 3 4 5 6 7"
   fld1="1 3 4 5 6 7 8 9"

   split(fld1, fld1A)
   split(fld2, fld2A)
}
function buildIDX(fldA,   i, idx) {
    for(i=1; i in fldA ;i++) idx=(i==1) ? sprintf("%.2f",$i) : idx SUBSEP sprintf("%.2f",$i)
    return idx
}
FNR==NR {
    f1[buildIDX(fld1A)]
    next
}
found && found--

{
   if (buildIDX(fld2A) in f1) {
      print
      found=1
   }
}

summer_cherry · March 16, 2009, 6:26am

nawk '{
	if(NR==FNR)
		_[$1$2$3$4$5$6]=1
	else
		if(_[$1$2$3$4$5$6]==1)
		{
			print
			getline
			print
			exit
		}
}' file1 file2