File comparison problems

wbport · July 3, 2018, 12:22pm

We are running parallel reports with Legacy vs a new system and want to compare certain lines with the diff or diff -b command. When a six digit number is read at the beginning of the line (no decimal point) we read and ignore a header line (next), move any minus sign to the end of the number and, if the next line starts with "Today", a line with the six digit number, a "|" symbol and the rest of the line is created. This is the sed script we use:

/^[0-9][0-9][0-9][0-9][0-9][0-9][^\.]/{
     N
     s/\n.*//
     N
     s/\n/|/
     s/-\([\.0-9]*\)/\1-/g
     /Today/p
     }

If there was no business done, the new system replaces 0.00 twice with a line break. We need to change:

Today 0 0.00 0.00 0.00 0.00
0.00
0.00 0.00 -2236.93

----------------- to ------------------

Today 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -2236.93

Can this be done with the above script or do I need to fix it with an earlier step?

TIA

Scrutinizer · July 3, 2018, 2:06pm

Can you post a sample of your source file?

wbport · July 3, 2018, 2:45pm

When business is done, the line starting with "Today" will be followed by a count and ten two-digit numbers (percents or dollar amounts)--my problem is when the count is zero and certain 0.00 are replaced with a new line.

This is what the set of data looks like:

123456
Constant column headings are on this line and will be ignored
Today 0 0.00 0.00   ... etc

-------------- output ----------------
123456 |Today 0 0.00 0.00 0.00   etc -2236.93

I have been experimenting with this sed script as a new first step:

/^Today 0 0.00 / { N
                   N
                   s/\n/0.00 /g
                 }

Thanks for looking at it.

Scrutinizer · July 3, 2018, 4:07pm

Your data sample show a first line with 6 digits, whereas your sed script in post#1 tests for 6 digits followed by any character that is neither . or a \ . So the script will not match the first line of your data sample.

Could you post a more accurate sample?

wbport · July 3, 2018, 6:33pm

The "\" was to escape the decimal point so I wanted six digits not followed by a decimal point. That got rid of my false positive using SCO Unix. Six digits with nothing else on the line was the only thing I was interested in.

Scrutinizer · July 3, 2018, 10:24pm

Hi, OK.

So to match 6 digits on a line only, try:

/^[0-9]\{6\}$/{

Also note that a \ inside square brackets denotes a literal \ character, not an escape.

wbport · July 5, 2018, 10:53am

That didn't fly the first time but then I discovered a trailing space after all the valid six digit numbers, so adding a space before the $ fixed it.

I didn't realize a period didn't need to be escaped when in a bracket. In the first code in my OP on the last s command (ends with /g), I took out the " \ " with no ill effects.

Again thanks.

Scrutinizer · July 5, 2018, 5:49pm

You're welcome..

To match 6 digits with possible trailing whitespace, you can use:

/^[0-9]\{6\}[[:blank:]]*$/{

That way any combination of zero or more trailing tabs and/or spaces are matched.