Problem in using diff

ragavhere · April 18, 2008, 8:08am

Hi,

When i use diff command, i get the output like this

2c2
< Table Name: AAA Row Count:96 SUM(F1):  3739 MAX(F1):77 MIN(F1):  0 AVG(F1): 38.9479167 LENGTH(LINE): 2260
---
> Table Name: AAA Row Count:96 SUM(F1):  4009 MAX(F1):77 MIN(F1):  0 AVG(F1): 40.9479167 LENGTH(LINE): 2260
4a5,10
>

I need an output without the unnecessary symbols. That is my ouput should like this.

Table Name: AAA Row Count:96 SUM(F1):  3739 MAX(F1):77 MIN(F1):  0 AVG(F1): 38.9479167 LENGTH(LINE): 2260
Table Name: AAA Row Count:96 SUM(F1):  4009 MAX(F1):77 MIN(F1):  0 AVG(F1): 40.9479167 LENGTH(LINE): 2260

What could be done to get an output like this? Can anyone help?

sysgate · April 18, 2008, 9:44am

Depending on the implementation of diff you're using, you can have "--exclude=pattern" where pattern may be characters. Look at the man pages and you'll find the answer.

Sarahb29 · April 18, 2008, 10:02am

I usually use sdiff - it still has symbols in it, but I find the output to be a lot cleaner, and it shows (in full, and not a bunch of broken up lines) both of the files you're comparing side by side.

Dave_Miller · April 18, 2008, 10:34am

You may also want to try the cmp command - stands for compare.

It produced output in three columns: Lines unique to file1, unique to file2 and common. Flags can be used to eliminate columns.

If you select the flags to eliminate column 1 and 2, all you're left with is the common lines, without any additional characters, lines, or anything. I think that will give you the output you're asking for.

ragavhere · April 19, 2008, 1:11am

Hi,

I need to display only the lines which mismatch from both the files.So i cant use cmp.It will only give me lines unique to either of the files.

So its better to use diff and remove the unnecessary characters and symbols.

How can i remove the "<" and "---" and "2c2","4a5,10". These numbers can go upto 1000 or more. Is there a generic way to remove these from the output obtained by using diff command?

era · April 19, 2008, 5:45am

I'm not sure why you can't use cmp. Lines which differ are, by definition, unique.

You can create a simple sed script to remove the diff markup, but I'd still like you to give some more thought to using cmp after all. To me, it sounds like the right tool for this job.

sed -e '/^-/d' -e '/^[1-9]/d' -e 's/^[<>] //' difffile

The character ^ indicates start of line in a regular expression. Any line with a dash at start of line is deleted. Any line with a non-zero number at start of line is deleted. On remaining lines, a wedge (either way) followed by a space at start of line is replaced with nothing.

era · April 19, 2008, 9:56am

Oopsie, thinko, I meant comm of course. But I presume that's what you other guys have been meaning all along as well.