I have a script that I am trying to apply on files that have form feeds between pages but I am trying to replace the last form feed, with carriage return so that when I convert it to a PDF file it won't generate a blank page.
My script looks like this
sed '$ s/\^L/\^M/' invoice.txt >invoice.dat
My file looks like this, keep in mind some files have more than one page so I have to only remove the last form feed.
^M^M^[[51t^M^M ABC COMPANY ABC SHIPPING COMPAN
Y ^M
123 BAKER STREET 123 DOCK STREET ^
M
ANYTOWN USA 99999 ANYTOWN USA 08277 ^
M
^
M
^M
PAGE 1
^M
^M
155 498 MS 07:43 AM J 08/01/13 01/23/14*C428230
^M
^M
1 PLA T237 WHITE PRIMER 10.13 6.58 6.58 T
^M
1 PS STRAINERS PAINT SUPPLYS
^M
1 SEM 39683 GRAY SELF ETC 20.57 20.57
^M
1 SEM 39693 ETCH PRIMER 20.57 20.57
^M
1 NAS QUART 421-19 SELECTPRIME 2 22.75 22.75 T
^M
1 NAS 1/2 PT 483-87 SELECTPRIME A 15.60 15.60 T
^M
1 NAS QUART 441-21 MED FB REDUCE 11.44 11.44 T
^M
1 NAS 1/2 PT 483-11 FUL-CRYL CAT. 33.86 33.86 T
^M
2 PS MC32 QT MIX CUP 0.50 1.00 T
^M
1 DUP M64 DUP. TAC RAG 1.13 1.13 T
^M
2 MMM 7447 SCOTCHBRITE 1.25 2.50 T
^M
1 PS UG18 3/4" MASKING 1.43 1.43 T
^M
1 PS PINT CANS PAINT SUPPLYS
^M
1 BAT 10U1R MOWER BATTERY 43.78 43.78
^M
1 GAT 6984 FRACT.HP BELT 63.08 37.77 37.77
^M
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
^M
1 WIX 42299 WIX AIR FILT. 15.26 9.87 9.87
^M
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
^M
1 WIX 51521 WIX OIL FILT. 14.50 8.91 8.91
^M
^M
^M
279.26 ^M
^M
6.74 ^M
^M
^M
286.00 ^M^L
^M^M^[[66t^M
My results are as follows in which the last line is completely dropped.
ABC COMPANY ABC SHIPPING COMPANY
123 BAKER STREET 123 DOCK STREET
ANYTOWN USA 99999 ANYTOWN USA 08277
PAGE 1
155 498 MS 07:43 AM J 08/01/13 01/23/14*C428230
1 PLA T237 WHITE PRIMER 10.13 6.58 6.58 T
1 PS STRAINERS PAINT SUPPLYS
1 SEM 39683 GRAY SELF ETC 20.57 20.57
1 SEM 39693 ETCH PRIMER 20.57 20.57
1 NAS QUART 421-19 SELECTPRIME 2 22.75 22.75 T
1 NAS 1/2 PT 483-87 SELECTPRIME A 15.60 15.60 T
1 NAS QUART 441-21 MED FB REDUCE 11.44 11.44 T
1 NAS 1/2 PT 483-11 FUL-CRYL CAT. 33.86 33.86 T
2 PS MC32 QT MIX CUP 0.50 1.00 T
1 DUP M64 DUP. TAC RAG 1.13 1.13 T
2 MMM 7447 SCOTCHBRITE 1.25 2.50 T
1 PS UG18 3/4" MASKING 1.43 1.43 T
1 PS PINT CANS PAINT SUPPLYS
1 BAT 10U1R MOWER BATTERY 43.78 43.78
1 GAT 6984 FRACT.HP BELT 63.08 37.77 37.77
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
1 WIX 42299 WIX AIR FILT. 15.26 9.87 9.87
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
1 WIX 51521 WIX OIL FILT. 14.50 8.91 8.91
279.26
6.74
sed 's/\^M//g;s/\^L//g;s/\^\[\[66t//g;s/\^\[\[51t//g' file_name
If these are not meta chars.
Output is as follows.
ABC COMPANY ABC SHIPPING COMPAN
Y
123 BAKER STREET 123 DOCK STREET ^
M
ANYTOWN USA 99999 ANYTOWN USA 08277 ^
M
^
M
PAGE 1
155 498 MS 07:43 AM J 08/01/13 01/23/14*C428230
1 PLA T237 WHITE PRIMER 10.13 6.58 6.58 T
1 PS STRAINERS PAINT SUPPLYS
1 SEM 39683 GRAY SELF ETC 20.57 20.57
1 SEM 39693 ETCH PRIMER 20.57 20.57
1 NAS QUART 421-19 SELECTPRIME 2 22.75 22.75 T
1 NAS 1/2 PT 483-87 SELECTPRIME A 15.60 15.60 T
1 NAS QUART 441-21 MED FB REDUCE 11.44 11.44 T
1 NAS 1/2 PT 483-11 FUL-CRYL CAT. 33.86 33.86 T
2 PS MC32 QT MIX CUP 0.50 1.00 T
1 DUP M64 DUP. TAC RAG 1.13 1.13 T
2 MMM 7447 SCOTCHBRITE 1.25 2.50 T
1 PS UG18 3/4" MASKING 1.43 1.43 T
1 PS PINT CANS PAINT SUPPLYS
1 BAT 10U1R MOWER BATTERY 43.78 43.78
1 GAT 6984 FRACT.HP BELT 63.08 37.77 37.77
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
1 WIX 42299 WIX AIR FILT. 15.26 9.87 9.87
1 GAT A100 HEA.DUTY BELT 34.65 20.75 20.75
1 WIX 51521 WIX OIL FILT. 14.50 8.91 8.91
279.26
6.74
286.00
When I ran the statement as you created it I still lost the last line. I did some more testing and no matter what with the SED command the last line was dropped so I am thinking I have meta characters on that last line causing the issue. I decided to do a test using awk and it read the entire file and wrote out the entire file something sed was not doing so what I need to know how to do is use AWK to replace the last form feed on the last line. Not sure of the syntax, something like this
The problem comes from discovering what the 'last line' is. sed et al can't look back in time to see that. awk can read the file twice however, once to determine, once to substitute:
Before trying to speculate whether or not you have any particular format why not try
a hexdump and have a deeper look inside the file so that you know exactly what you
are up against.
hexdump -C /full/path/to/filename
Once we know exactly what we are looking at then we can give informed replies.
Hope this helps...
So if I wanted to replace the FF with LF would this be the correct format? Also did you mean to have the input file twice in your example? I tried it both ways with input file twice and once and my result file is blank
Pardon me, missing one crucial character from my code. Also, the output string should be "\012", since it's not a regular expression like the input is.
Actually, just "\n" will do.
And you forgot to list the filename twice. I even explained why it has to read the file twice -- it needs to find out what the last line is.
This is true but looking back in time is not needed - as is reading a file twice.
The logic to apply in sed is: starting with a form-feed character read lines until you encounter another form-feed (in this case dump the lines already read into output and start over) or the last line. In case of the last line replace the first character in the pattern space (the FF) with a newline and dump the result to output. All other lines are simply added to the pattern space ("N") until one of the two conditions above is reached.
The script for this looks like this (<FF> is a form-feed character and supposed to be on a line of its own, that is: preceeded by a newline):
sed -n '$ {
s/^./\n/p
}
/<FF>/ {
s/.$//p
s/^.*$/<FF>/
}
N' /path/to/file
Ah, but it does look back -- it needs to wait until the entire file's already read to decide whether to substitute or not. It has a buffer, which either gets printed or not depending. That kind of logic -- "read until x, buffer until y, print" has a lot of special cases(what if x happens without y?) which makes the code larger and more bugprone, I found it more straightforward to read the file twice since it appeared to be small. It was a design choice.