Need help in column comparison & adding extra line to files

b11 · November 22, 2012, 2:17pm

Hi,

I wanted to check whether the x,y,z coordinates of two files are equal or not. At times, when one file is converted to another suitable file extension , there are some chances that the data mismatch would happen during the conversion. In order to avoid the data misfit, i would like to compare the co-ordinates of the both files and check whether they are equal or not.

For example, following are the sample files :

File1 (here additional columns are intentional left, but in real file, there are around 10 columns)

x_coord   y_coord     z_coord     ID            

1.01     2.56       8.32         1  
1.06     2.50       7.36         2
1.08     2.69       4.25         1

File2 (here additional columns are intentional left, but in real file, there are around 10 columns)

ID      x_coord        y_coord     z_coord                        

1          1.01        2.56      8.32
2          1.06        2.50      7.36
1          1.08        2.69      4.25

Expected Output:

comp

0     0     0     0
0     0     0     0
0     0     0     0

Below, is the code i tried.

awk '{print $2,$3,$4,$5}}' file1 > xyz
awk '{print $2,$3,$4,$5}}' file2 > data
paste xyz data > inn
awk '{print $0,$1-$6,$2-$7,$3-$8,$4-$5}' inn > core
awk '{print $9,$10,$11,$12}' core > comp
rm xyz data inn core

The above code gives the desired result as displayed in the expected output. But, is there any other way of using the shell commands more efficiently. !!!!

Don_Cragun · November 22, 2012, 2:30pm

This sounds like a homework item; if it is it shouldn't be posted here.

If this isn't a homework item, what is supposed to happen if the values are equal and what is supposed to happen if they are not equal?

Yes, it is possible to add lines to the end of a file without using cat.

If you want help figuring out how to do something, you will be much more likely to get useful results if you provide sample input files and show us the output that should be produced (all using code tags).

b11 · November 22, 2012, 3:36pm

@ Don Cragun: As you can see that , I have amended the post with the existing code which I use.

Can you throw your expert opinion on how further the script can be handled efficiently ?

Don_Cragun · November 22, 2012, 3:43pm

b@l@ji,
Please post sample contents for file1 and file2 and show us the output you want when processing those sample files.

b11 · November 22, 2012, 3:56pm

@ Don Cragun :

As suggested, I have modified it with the sample contents of file1 file2 and the expected output comp.

Don_Cragun · November 22, 2012, 8:41pm

The following should work the same as your six line script reading the data from each input file once (instead of four times), call awk once (instead of four times), and not calling paste or rm at all:

awk -v f2=file2 'BEGIN {OFS = "\t"}
{       o1 = $2; o2 = $3; o3 = $4;  o4 = $5
        getline < f2
        print o1 - $3, o2 - $4, o3 - $5, o4 - $2
}' file1 > comp

As with your script, surprising things may happen if file1 and file2 don't contain the same number of lines.
------------------
PS for the 2nd part of your original posting (which has been lost in all of your edits), two portable ways to add:

Numbers
5678

to the end of a file without using cat include:

printf "Numbers\n5678\n" >> file

and

echo Numbers >> file;echo 5678 >> file

Note that printf is a built-in in ksh and echo is a built-in in most shells. Some versions of echo always recognize backslash escape sequences, some versions of echo never recognize backslash escape sequences, and some versions of echo violate POSIX standard and Single UNIX Specification requirements by accepting an option that determines whether or not backslash escape sequences are recognized. The uses of echo shown above should do what you want here with any version of echo .

Scrutinizer · November 23, 2012, 2:38pm

FIW, printf is not only a shell built-in ksh, but also in bash and dash, for example.