Red hat diff data files

cokedude · February 20, 2026, 3:19am

Is there a good diff program on red hat for data files? I just compared two mostly identical files on both diff and vimdiff. Neither diff or vimdiff could find the difference in the couple of columns that were different. I copied the files from my red hat server to my computer and notepad ++ with the compareplus plugin quickly found the differences.

vgersh99 · February 20, 2026, 4:49am

If diff doesn't work, you could possible do it with [g]awk.
Post a representative small samples of both files and indicate the key "diff" fields.

hicksd8 · February 20, 2026, 10:46am

I would imagine that, since you are talking about comparing data files, possibly differences in carriage return/newline characters and/or differences in whitespace. You should be able to manage those with a combo of command line switches. Refer to the man page for diff.

And, yes, as @vgersh99 said, give us samples please.

Paul_Pedant · February 20, 2026, 4:09pm

Are these plain ASCII files, are they CSV, and what options did you use ?

Files that contain non-ascii or unprintable characters may be rejected for comparison, and I have doubts about UTF-8 characters. Setting LC_ALL=C may be helpful.

There are several options to diff which ignore various differences in whitespace, upper/lower case, spacing, and tab expansion.

CSV is permitted to contain newlines in quoted fields, which can disrupt line comparisons.

diff should be bullet-proof on text files. The fall-back is cmp, which does byte-for-byte comparisons. The side effect of that is that it does not rely on newlines to resync after inserted or deleted characters, so it tends to report the rest of the bytes in the file after the first mismatch.

cokedude · February 23, 2026, 11:31pm

I hope these are good samples.

almost identical file1

identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 uygbfiupsdfpiufsdoipjfsda identical_col_7 ojndfs[oijasfdpiasdf
identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 ojinpdfsfsdklflkmnfskjlfs identical_col_7 jbifdjnpdfjfjnpsadfs

almost identical file2

identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 ibsdajnpdfsajpndfsajnpfas identical_col_7 ibsadfposdafijnpfsds
identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 hvojsdfjbpsfadjbpisdfajbp identical_col_7 ibyhfosdiudfsbhofdbj

cokedude · February 23, 2026, 11:34pm

Column 6 and Column 8 are the differences.

cokedude · February 23, 2026, 11:36pm

Yes they are ASCII text files.

vgersh99 · February 24, 2026, 3:31am

@cokedude
Please use markdown code tags when posting code/data samples...

cokedude · February 24, 2026, 3:53am

This one?

Paul_Pedant · February 24, 2026, 7:49am

I get 80 differences between the two files using cmp (which compares individual characters, in octal notation).

paul: ~ $ wc  Dude.1 Dude.2
  2  16 286 Dude.1
  2  16 286 Dude.2
  4  32 572 total
paul: ~ $ cmp -l  Dude.1 Dude.2 | wc -l
80
paul: ~ $ cmp -l  Dude.1 Dude.2 | head -n 5
 81 165 151
 82 171 142
 83 147 163
 84 142 144
 85 146 141
paul: ~ $ cmp -l Dude.1 Dude.2 | tail -n 5
280 160 150
281 163 157
282 141 146
284 146 142
285 163 152
paul: ~ $

I get both lines different with a plain diff command. Of course, diff does not care about columns. If two columns are separated by one space in one file, and two spaces in the other, then the whole line is considered different, even though every field matches.

paul: ~ $ diff  Dude.1 Dude.2
1,2c1,2
< identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 uygbfiupsdfpiufsdoipjfsda identical_col_7 ojndfs[oijasfdpiasdf
< identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 ojinpdfsfsdklflkmnfskjlfs identical_col_7 jbifdjnpdfjfjnpsadfs
---
> identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 ibsdajnpdfsajpndfsajnpfas identical_col_7 ibsadfposdafijnpfsds
> identical_col_1 identical_col_2 identical_col_3 identical_col_4 identical_col_5 hvojsdfjbpsfadjbpisdfajbp identical_col_7 ibyhfosdiudfsbhofdbj

I was expecting some subtle issue involving hidden characters, CRs, etc, but these files are monstrously different. I can only assume you had some finger trouble somewhere in the file copying between the systems, and you end up comparing a file with itself.

Incidentally, the PiXhost package you used for the screenshots with “This one?” apparently attaches some spurious advertisements which are definitely NSFW.

vgersh99 · February 24, 2026, 10:54pm

@cokedude Sorry to say, but you've been around these forums for quite awhile - you should know how to post with code/data samples...

MadeInGermany · February 25, 2026, 6:50am

Yes, click that symbol on a new line. Or mark full lines (the code block) then click.
It should place a ``` on separate lines before and after the code block.