Compare files with fields separated with semicolon

Alkass · July 16, 2011, 9:27pm

Dear experts

I have files like

ABD : 5869 events, relative ratio : 1.173800E-01 , sum of ratios : 1.173800E-01 
VBD : 12147 events, relative ratio : 2.429400E-01 , sum of ratios : 3.603200E-01 
SDF : 17000 events, relative ratio : 3.400000E-01 , sum of ratios : 7.003200E-01 
OIP: 14984 events, relative ratio : 2.996800E-01 , sum of ratios : 1.000000E+00

so, the values I want to compare are separated with a ':' followed by a space -- I want to use file1 as reference, and compare file2 value by value and line by line but comparing only columns 2,3 etc until the end of each line, and if I have a mismatch by a certain ratio (valueA/valueB) while comparing each given column on each given line do something, (actionA) otherwise do actionB

Thanks in advance

matrixmadhan · July 16, 2011, 11:10pm

Hint

awks and associative arrays
or try in perl, you will have greater control

Alkass · July 16, 2011, 11:11pm

an example would be greatly appreciated..

thanks!

neutronscott · July 16, 2011, 11:37pm

It sure would! Your input is rather vague.

awk 'NR==FNR { a[$0]++; next }
($0 in a) { do stuff }'

This puts first1 into an array, then looks for each line of file2 inside of the array.. I am not sure what columns you mean to compare. All of them, it sounds like, individually?? And is that first column the one that should be used to actually compare data between the two files? Like, compare the numbers for ABD in file1 only to the numbers for ABD in file2? ...

Edit: Best way to show us is a small file1 and file2 example, with desired output.

Alkass · July 16, 2011, 11:41pm

thanks for the reply

Sorry if I was not clear -- I want to compare 1rst field after the first ':' from file1 with 1rst field after the fist ':' from file2 , then the same for the second fields etc etc on a line - by - line basis

neutronscott · July 16, 2011, 11:52pm

So, you want to compare line1 of file1 to line1 of file2. You calculate a delta ratio between the two and perform an action whether they're within range or not?

Alkass · July 16, 2011, 11:54pm

yep, exactly this! -- in each line, the value I want to compare starts after the ": " ,so is a 1-to-1 comparison

Can you provide a script?

thanks!

neutronscott · July 17, 2011, 12:11am

Here is example to start

awk -F":" 'NR==FNR { a[FNR]=$0; next }
{
	split(a[FNR],b,FS);
	if ($3 + 0 == 0) next;
	printf("compare %f / %f = %f\n", b[3], $3, b[3]/$3)
}' file1 file2

Alkass · July 17, 2011, 12:33pm

Hello

The script works fine, I am just having problems doing something like

if \( b[3]/$3 &lt; 0.96 \)
                                      printf\(" \\n WARNING!! below tolerance %f for file %s\\n", b[3]/$3 , $1\)

inside the awk block, cannot recognize the $1 input, ie the first file... so, how can I do something if the ratio is below 0.95 for example ? Like removing the file etc etc

Thanks a lot

Jinoshan · November 26, 2011, 8:17pm

Hi i have two files and i want to compare them
file 1
jane
jin
kiru
file 2
group1:x:1000:jane,jin
group2:x:111:kiru,jin

i want a file like this
jane group1
jin group1,group2
kiru group 2

i have to submit this assignment before 12 am today pls help me