write a script to compare two files

usha_rao · February 2, 2009, 2:45am

Hi all,
i am new to unix,i have never worked on scripting and all,i am learning now and i have to write a script to compare two files.
The requirement is like : in the first file i am searching for a word and after i get that word i have to select everything from the rest of the file and redirect to some output file.the second part is to compare the output file with my second file.the second part is feasible for me to do,but i am facing problem to proceed with the first part,Could anyone help me.

Franklin52 · February 2, 2009, 3:48am

For the first part you can use sed:

sed -n '/pattern/,$p' file

Regards

raj001 · February 2, 2009, 4:46am

Please refer the link, this might help you.

Thanks.

usha_rao · February 3, 2009, 11:23am

Hi all,thanks for ur help but i think i couldn't get what i needed.
Here is the sample file eg:abc.txt
<servergroup id="8">
<application value="OnlineDisputes"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51058"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc -Xss8m"/>
</servergroup>
<servergroup id="9">
<application value="Transfer"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51059"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc -Xrunheappro
file"/>
</servergroup>
<servergroup id="10">
<application value="INTLEStatement"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51063"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc
-XX:MaxPermSize=128m"/>
</servergroup>
</servergroup-info>
<clone-info>
<clone id="1">
<application value="EnterpriseServices"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21050"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="2">
<application value="GlobalMR"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21051"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="3">
<application value="GlobalUserManagement"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21052"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="4">
<application value="USAccountSummary"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21053"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>

In the above file i need to select everything below <clone-info>
and redirect it to an output file.
I would be very greatful if anybody can help me in this regard.

vgersh99 · February 3, 2009, 11:34am

You 'think' or are you sure?
What is it that you 'know' that you want as the output?
And how's that compare to the 'sed' solution provided?

usha rao:

Here is the sample file eg:abc.txt
<servergroup id="8">
<application value="OnlineDisputes"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51058"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc -Xss8m"/>
</servergroup>
<servergroup id="9">
<application value="Transfer"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51059"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc -Xrunheappro
file"/>
</servergroup>
<servergroup id="10">
<application value="INTLEStatement"></application>
<minheap value="256"></minheap>
<maxheap value="512"></maxheap>
<httpport value="51063"></httpport>
<additionalcmdlineargs value="-DSLConfigFile=app_servicelocator.xml -verbosegc
-XX:MaxPermSize=128m"/>
</servergroup>
</servergroup-info>
<clone-info>
<clone id="1">
<application value="EnterpriseServices"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21050"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="2">
<application value="GlobalMR"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21051"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="3">
<application value="GlobalUserManagement"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21052"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>
<clone id="4">
<application value="USAccountSummary"></application>
<cloneindex value="1"></cloneindex>
<node value="spdwa013"></node>
<httpport value="21053"></httpport>
<workingdir value="/tmp"></workingdir>
</clone>

In the above file i need to select everything below <clone-info>
and redirect it to an output file.
I would be very greatful if anybody can help me in this regard.

quirkasaurus · February 3, 2009, 11:35am

Franklin's solution works with slight modification:

sed -n '/<clone-info>/,$p' file_in > file_out

usha_rao · February 3, 2009, 9:46pm

Hey all, Thanks a lot..!!!
Its working now..!!

usha_rao · February 4, 2009, 1:09am

There are two files file1.txt and file2.txt
and the contents of files are given as follows:
file1.txt
{
EnterpriseServices
GlobalUserManagement
USAccountSummary
USEStatement
MYCAServices
EnterpriseServices
MYCAServices
USEStatement
}
file2.txt
{
USEStatement
MYCAServices
EnterpriseServices
MYCAServices
USEStatement
DocGen
OnlineDisputes
Transfer
INTLEStatement
}
Now the problem is how to compare these two files and redirect to an output file the contents which are present in file1.txt but which are not present in file2.txt

Please help in this regard.

angheloko · February 4, 2009, 1:18am

You can use comm -13 or comm -23. Just play with it a little

usha_rao · February 4, 2009, 2:02am

I am not sure how exactly i can use comm -13 or comm -23 because i need to write some generic script to compare two files and redirect the output to a third file with the contents which are present in first file but not present in second file.

usha_rao · February 11, 2009, 6:48am

Hi all,
i tried comm command in all the possible ways but could not get the expected result.
I will explain one more time.i have two files containing some words ,it is not necessary that the worrds are in same order in both the files but same words will be there.i want to compare those files and redirect the result to a third file which will contain all the words which are present in first file but not in second.

comm command is not solving the purpose as it compares line by line even if a same word is present in 1st line of file one and 4th line of file two then it is giving the output.

The result should be like : if same words are present in both the files no matter on which line number they are then the final output should be zero.

Please help.
Thanks

Whiteboard · February 11, 2009, 9:39am

Something like this???

grep -v -f file2 file1

usha_rao · February 11, 2009, 11:25am

The problem in the solution you have given is there is no -f option with grep command?
Could anyone help me out??

Can anyone tell how to use a loop inside awk command so that the words of second file are compared with all the words of first file and those that are present in first file but not in second are given as output??
Thanks in advance

Anubhav · February 12, 2009, 12:24pm

Try this it should work as long as you have saved file1.txt & file2.txt seprately

diff file1.txt file2.txt | grep ">" | sed 's/>//g'

angheloko · February 15, 2009, 9:59pm

Hope this helps:

ddl10197@GT109867 ~
$ sort foo | uniq > foo.tmp

ddl10197@GT109867 ~
$ sort bar | uniq > bar.tmp

ddl10197@GT109867 ~
$ comm -23 foo.tmp bar.tmp
GlobalUserManagement
USAccountSummary

ddl10197@GT109867 ~
$ grep GlobalUserManagement bar

ddl10197@GT109867 ~
$ grep USAccountSummary bar

usha_rao · February 23, 2009, 6:29am

Hi All,
Thanks for your response.But the solution provided seems not to be dynamic.
If any one can help running a for loop comparing the strings one by one in both the files i.e(comparing the first string from first file with all the strings in second file if it is not present outputting into third file).

i have got a code chunk,but it is not working,
can anyone help me modify it for the above requirement?

first code chunk i got is:
awk 'FILENAME=="file1" { if( $0 in arr) {continue} else {print $0 } } FILENAME=="file2" {arr[$0]++ }' file2 file1

but i guess there is some syntax problem as it is not working

The other code chunk i got is:

#!/usr/bin/ksh
set -x
> outputfile
awk '
BEGIN {
while ( getline < "file1" ) { arr[$0]=1 }
}
{ if ( arr[$0] != 1 ) print FILENAME":" $0
else delete arr[$0];
}
END {
for( key in arr )
if ( arr[key] == 1 ) print "file1:" key
} ' file2 >> outputfile

Can anyone help me finding what is the problem with the above two code chunks as i am not able to run these,getting some errors

Else,if both the codes doesnt work,could anyone help me providing the code to compare two files containing strings and output the string that are present in first file but not in second in third file.

You can use arrays or any other way,

The files will look like

file1
BasicServicesClone2
BasicServicesClone1
DocGenClone1
USEStatementClone2
USEStatementClone1

file2
BasicServicesClone2
BasicServicesClone1
DocGenClone1
USEStatementClone2

the output should be
USEStatementClone1

as it is present in file 1 but not in file2

quirkasaurus · February 23, 2009, 9:39am

angheloko's solution works.

Franklin52 · February 23, 2009, 9:40am

Do you have a grep version without the -f option? With awk you can get the desired output with:

awk 'NR==FNR{a[$0];next}!($1 in a)' file2 file1

Regards

usha_rao · February 23, 2009, 10:14am

Hi Franklin52,

Thanks for your update but when i am trying the code which you have given,

It is showing as below:

awk 'NR==FNR{a[$0];next}!($1 in a)' file2 file1
awk: syntax error near line 1
awk: bailing out near line 1

angheloko's solution was good,there he was grepping seperately for each string which should not be the case, as i have to compare two files containing strings line by line and it should be generic i.e the strings may change and i have to output the strings which are present in first file and not in second.The file format i have mentione in my previous post.
Could anyone help me writing a generic script to first store all the strings of both the files in arrays and compare those arrays and one by one??

Because the above line of code is not working here,Syntax error?

Could anyone help??

Thanks in advance

Franklin52 · February 23, 2009, 10:28am

Use nawk or /usr/xpg4/bin/awk on Solaris.

Regards