How to ignore white spaces while comparing two files.?

sharsour · July 15, 2013, 1:57am

Hello Experts,

I am trying to compare two files line by line with below code. I want to ignore the spaces while comparing. Only content should be compared.

 
hostFile="/etc/hosts"
inputFile="/home/scripts/DR/hosts.eas"
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' $inputFile > temp1
result=$(grep -vxFf $hostFile temp1)

It is looking for the lines starting with Ip address in the host file like 127.0.0.1

If i give any space in between Ip address and host name it is saying entry is not matching.

Below is the sample entry

 
10.3.242.177       mugwump mugwump.test.server

millan · July 15, 2013, 2:09am

Can u give your input file contents.So that it wil be easier to see what exact match you want.

sharsour · July 15, 2013, 2:18am

ok

Host File

 
10.3.242.170 sasquatch sasquatch.test.server
10.3.242.171 nessie nessie.atldc.test.server

Input File

 
10.3.242.170      sasquatch sasquatch.test.server
10.3.242.171 nessie nessie.atldc.test.server

In input file, you can see there are more space after IP address. So it should ignore that space though content is matching.

I am comparing input file enteries should be matching with the host file enteries. Host file can have more entries also but Input file enteries must be there in the host file. I am ignoring comments with that regular experssion but need to ignore spaces also.

vidyadhar85 · July 15, 2013, 2:22am

why dont you try using awk? You will get number of examples in the forum to match 2 files using awk.

millan · July 15, 2013, 2:42am

Try the below code which wil squeeze the space.

 
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' $inputFile | tr -s " " > temp1

sharsour · July 15, 2013, 2:57am

Two things here am looking at matching a smaller set of input file against a big data file and print the contents of input file that is not part of data file.
I am looking for the strict line match. I am ignoring comments with taht regular expression and now i need to ignore the whitespace in between or after the line.

grep -vxFf datafile inputfile

is the one liner command which is doing job for me. In Awk , it might required to build the big logic to strictly match the lines and need to see also how much time it is taking in comparing file.

This code is working fine, just want to ignore whitespace also as am doing for ignoring comments.

---------- Post updated at 01:57 AM ---------- Previous update was at 01:44 AM ----------

Thanks Milan for your effort.

Actually your suggestion is trimming the spaces in the input file but I want to ingore any spaces while matching the code.

 
 grep -vxFf $hostFile temp1

vidyadhar85 · July 15, 2013, 2:58am

Below command will store the content of hostfile in array and print those lines of input file which is not there in hostfile.

 
awk 'NR==FNR{A[$1$2$3]=$0;next}
/^[0-9]/{if(!A[$1$2$3]){print $0}}' hostfile inputfile

sharsour · July 15, 2013, 6:20am

Thanks Vidydhar,

I tried your command, but it is not going through and even not throwing any error on the console.

One more question,

What I can understand is you are stroing each keyword of a line in $1, $2, $3
In some of the cases, lines will contain 5 words also and in some cases only 3. Will this solution work for it

10.3.242.170 sasquatch sasquatch.test.server
10.3.242.171 nessie nessie.atldc.test.server
10.3.242.172 nessie nessie.atldc.test.server sasquatch.test sasquatch.test.server

So like this there will be multiple lines.

MadeInGermany · July 15, 2013, 6:52am

vidyadhar85:

Below command will store the content of hostfile in array and print those lines of input file which is not there in hostfile.
 
awk 'NR==FNR{A[$1$2$3]=$0;next}
/^[0-9]/{if(!A[$1$2$3]){print $0}}' hostfile inputfile

There should be $1 FS $2 FS $3 instead of $1$2$3 . This is still a bit ugly because the "# fields for comparison" is hard-coded (here: 3).
The following is unlimited:

awk '/^[^0-9]/ {next} {$1=$1} NR==FNR {A[$0]; next} !($0 in A)' $hostFile temp1

$1=$1 is the trick to reorganize the input line with OFS (single space character).

sharsour · July 15, 2013, 7:12am

Thanks Made in germany, It worked like anything.

Thanks a lot.