compare two files using while and sed .. debug my script please

shamushamu · September 25, 2008, 5:59pm

Hi, I'm a newbie to Linux. I have not done programming before, but I accidentally stumble upon Linux scripts at work about 2 weeks ago. I got interested and write scripts to automate my job duties.

I need to write a script to compare 2 files (very long list) side by side so it's easier to spot the differences. Here's the objective:

File1
-----
die1
die2
die3
lb_name1
lb_name2
lb_name3

File2
----
die2
lb_name1
lb_name2

Desire output
-------------
die1
die2 die2
die3
lb_name1 lb_name1
lb_name2 lb_name2
lb_name3

--------------------Here's my script--------------------
#!/bin/bash

# Compare two files and show the difference

awk -F';' 'BEGIN{while(getline<"file1") a[$1]=2};a[$1]!=2' file2 > listdifference

cksum file1 | awk '{print$1}' > x
cksum file2 | awk '{print$1}' > y
x=`cat x`
y=`cat y`

function test
{
if [ $x -le $y ]
then
echo "First file x is smaller. Add difference into smaller file x and remove the difference."

paste -d"\n" listdifference file1 | sort | uniq > lists
awk NF lists > listfinal

while read difference
do
echo "Deleting line $difference"
#sed "/$difference/d" listfinal > output
sed -e "s/$difference//g" listfinal > output
done < listdifference
else
echo "Second file y is bigger. Add line into x."
fi
}
test
--------------------End script---------------------

Most of these commands I found here in this forum. Thanks to everyone who post these commands. Also, I google some of the commands and self taught.

The above awk command compares file1 and file2 and gives me their differences in listdifference.

Then I write a loop just to remove the differences using sed and leave the blank space. Save the result to a file call output.

output (wanting to achieve)
------

die2

lb_name1
lb_name2

When I run my lenthy script, the output looks like below, which is not what the loop sed should have done:

output (result after script is ran... doesn't look right)
------
die1
die2
die3
lb_name1
lb_name2

The sed in the loop should have removed each individual lines that are not the same (listdifference) and leave a blank. But it only do this to the last line. I've tried other ways to sed -f , sed -e ' -e , and cmd=$cmd, etc ... but it still doesn't look right.

Once I have the output, then I can combine file1 and output using the following command to achive the desire output.

paste file1 output | awk -F '\t' '{printf "%-32s%s\n", $1, $2}' > desire_output

I'm getting close. I'm debugging and learning at the same time. Please help me to debug or change my command synosis

I know there are experts in this forum that can write this using 1 or just a few lines. If just one line, then please break it down and explain so I can study from your script. This is the first time I post in any forum. I hope to be able to navigate back to this thread.

Thank you very much.

ctruhn · September 25, 2008, 7:26pm

I might be crazy but...

You show two examples of expected results in your post. I am kind of confused as to which you are looking for. Your first expected output shows two files side by side and the second one shows the output being only what is the same in the file.

But by reading the whole thing it appears that your end goal is to add file2's unique contents to file1, making one file with all of the content. But there are a few assumptions here that I think may hurt you, but before I get into those we need to know the goal.

I am not trying to bash you at all, if you just learned scripting and are already into sed and awk you are way ahead of where I was at 2 weeks, but what exactly are you looking for?

shamushamu · September 25, 2008, 8:07pm

Hi ctruhn, Thank you for replying so quickly. I was afraid that my post is lengthy and confusing.

My goal is to compare 2 different files and show the result side by side in this format:

Desire output of file1 and file2 separated by a tab
---------------
die1
die2 die2
die3
lb_name1 lb_name1
lb_name2 lb_name2
lb_name3

There should be a tab between the 2 column. (This forum will not let me use tabs .. it turns into spaces :(). So when we glance at this output, we will quickly see that the differences between the 2 files are die1, die3, and lb_name3. I've been researching in this post and googling for the past week and have not found a solution to this problem.

Please let me know if there is an easier way than what I proposed in my previous post.

Thanks.

ctruhn · September 25, 2008, 8:21pm

This may be a bit obvious, but if you want to see the differences couldn't you just use diff?

tcomku · September 26, 2008, 12:34am

You may try using sdiff,

sdiff file1 file2
die1 <
die2 die2
die3 <
lb_name1 lb_name1
lb_name2 lb_name2
lb_name3 <

-TCOK

nullwhat · September 26, 2008, 1:14am

see the man on sdiff

man sdiff
sdiff -s -w 200 file1 file2
is good
see if you have these
which tkdiff
which xdiff

if you have tkdiff or xdiff on your system you are in luck, other wise sdiff and diff will do most it the old school way...

shamushamu · September 26, 2008, 1:42am

Thank you to everyone who have replied but ....

Sometimes this might be a long list, unalphabetized, and the words might be lb_nhb_5x_gihsd2_frnas, so spotting the differences between 2 columns is not practical. Yes, I have tried a lot of simple commands like paste, sort, uniq, and bdiff, but we would really prefer the following format for spotting the differences quicker and easier:

Desire Output:
File1 Files
-----------
die1
die2 die2
die3
lb_name1 lb_name1
lb_name2 lb_name2
lb_name3

Each of these columns is in a different file. I'm simply wanting to combine the 2 files, but have the same item in rows. The differences should be blanks. Even the paste -d \n command doesn't do this.

I've tried to type in the tabs between these 2 columns, but this forum will not let me use tab. With the tabs, it's really easy to spot the differences. I've been struggling with this problem for a week now. It's fun scripting but I've hit a wall, so I really need your help

shamushamu · September 26, 2008, 12:13pm

Nullwhat,

Thank you for the script. I'm just got to work and tried your command:

sdiff -s -w 200 file1 file2 > file3

I was able to get close. So I tried:

sdiff file1 file2 > file3

and got what I was looking for. THANK YOU!

I know someone would be able to do what I was looking for with just one command. hehe.

Bijayant_Kumar · September 27, 2008, 2:49am

You can try in this way also, if you dont want that '>' and '<' in the lines

sdiff file1 file2 | tr -s ' <' '\t' | tr -s ' >' '\t'
and
sdiff file1 file2 | sed -e's/ <//g;s/\t/ /g;s/ >/\t/g'

Franklin52 · September 28, 2008, 8:30am

With awk:

awk 'NR==FNR{a[$0];next}$0 in a{print $0"\t"$0;next}1' file2 file1

Regards

shamushamu · September 29, 2008, 12:13pm

Wow! More power-users with just one line command. Thank you for the solutions everyone

The awk command above looks like Chinese language to me. I need to go online and dissect the function of each portion. I've seen some powerful awk commands here in this forum and don't know how others learned them. So many ways to do one thing in Linux. There must be a systematic way to learn the awk syntax and synosis ...

Thanks again.