Trying to use sed to remove last FF from file

I have a script that I am trying to apply on files that have form feeds between pages but I am trying to replace the last form feed, with carriage return so that when I convert it to a PDF file it won't generate a blank page.

My script looks like this

sed '$ s/\^L/\^M/' invoice.txt >invoice.dat

My file looks like this, keep in mind some files have more than one page so I have to only remove the last form feed.

^M^M^[[51t^M^M       ABC COMPANY                             ABC SHIPPING COMPAN
Y            ^M
       123 BAKER STREET                        123 DOCK STREET                 ^
M
       ANYTOWN USA 99999                       ANYTOWN USA 08277               ^
M
                                                                               ^
M
^M
                                                                        PAGE   1
^M
^M
  155 498  MS                          07:43 AM     J  08/01/13 01/23/14*C428230
^M
^M
   1       PLA T237          WHITE PRIMER            10.13    6.58     6.58 T
^M
   1       PS  STRAINERS     PAINT SUPPLYS
^M
   1       SEM 39683         GRAY SELF ETC                   20.57    20.57
^M
   1       SEM 39693         ETCH PRIMER                     20.57    20.57
^M
   1       NAS QUART 421-19  SELECTPRIME 2                   22.75    22.75 T
^M
   1       NAS 1/2 PT 483-87 SELECTPRIME A                   15.60    15.60 T
^M
   1       NAS QUART 441-21  MED FB REDUCE                   11.44    11.44 T
^M
   1       NAS 1/2 PT 483-11 FUL-CRYL CAT.                   33.86    33.86 T
^M
   2       PS  MC32          QT MIX CUP                       0.50     1.00 T
^M
   1       DUP M64           DUP. TAC RAG                     1.13     1.13 T
^M
   2       MMM 7447          SCOTCHBRITE                      1.25     2.50 T
^M
   1       PS  UG18          3/4" MASKING                     1.43     1.43 T
^M
   1       PS  PINT CANS     PAINT SUPPLYS
^M
   1       BAT 10U1R         MOWER BATTERY                   43.78    43.78
^M
   1       GAT 6984          FRACT.HP BELT           63.08   37.77    37.77
^M
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
^M
   1       WIX 42299         WIX AIR FILT.           15.26    9.87     9.87
^M
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
^M
   1       WIX 51521         WIX OIL FILT.           14.50    8.91     8.91
^M
^M
^M
                                                                     279.26 ^M
                                                                            ^M
                                                                       6.74 ^M
                                                                            ^M
                                                                            ^M
                                                                     286.00 ^M^L
^M^M^[[66t^M

My results are as follows in which the last line is completely dropped.

       ABC COMPANY                             ABC SHIPPING COMPANY
       123 BAKER STREET                        123 DOCK STREET
       ANYTOWN USA 99999                       ANYTOWN USA 08277
 
                                                                        PAGE   1
   155 498  MS                          07:43 AM     J  08/01/13 01/23/14*C428230
    1       PLA T237          WHITE PRIMER            10.13    6.58     6.58 T
   1       PS  STRAINERS     PAINT SUPPLYS
   1       SEM 39683         GRAY SELF ETC                   20.57    20.57
   1       SEM 39693         ETCH PRIMER                     20.57    20.57
   1       NAS QUART 421-19  SELECTPRIME 2                   22.75    22.75 T
   1       NAS 1/2 PT 483-87 SELECTPRIME A                   15.60    15.60 T
   1       NAS QUART 441-21  MED FB REDUCE                   11.44    11.44 T
   1       NAS 1/2 PT 483-11 FUL-CRYL CAT.                   33.86    33.86 T
   2       PS  MC32          QT MIX CUP                       0.50     1.00 T
   1       DUP M64           DUP. TAC RAG                     1.13     1.13 T
   2       MMM 7447          SCOTCHBRITE                      1.25     2.50 T
   1       PS  UG18          3/4" MASKING                     1.43     1.43 T
   1       PS  PINT CANS     PAINT SUPPLYS
   1       BAT 10U1R         MOWER BATTERY                   43.78    43.78
   1       GAT 6984          FRACT.HP BELT           63.08   37.77    37.77
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
   1       WIX 42299         WIX AIR FILT.           15.26    9.87     9.87
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
   1       WIX 51521         WIX OIL FILT.           14.50    8.91     8.91
 
                                                                     279.26
                                                                        6.74
 

Your results file doesn't show the CRs - is that the actual file, or a cat (without -v) to the terminal?

(i.e. is the last line actually missing, or is it just wrapping on your display?)

My result was in cat form,

here it is in raw form

^M^M^[[51t^M^M       ABC COMPANY                             ABC SHIPPING COMPAN
Y            ^M
       123 BAKER STREET                        123 DOCK STREET                 ^
M
       ANYTOWN USA 99999                       ANYTOWN USA 08277               ^
M
                                                                               ^
M
^M
                                                                        PAGE   1
^M
^M
  155 498  MS                          07:43 AM     J  08/01/13 01/23/14*C428230
^M
^M
   1       PLA T237          WHITE PRIMER            10.13    6.58     6.58 T
^M
   1       PS  STRAINERS     PAINT SUPPLYS
^M
   1       SEM 39683         GRAY SELF ETC                   20.57    20.57
^M
   1       SEM 39693         ETCH PRIMER                     20.57    20.57
^M
   1       NAS QUART 421-19  SELECTPRIME 2                   22.75    22.75 T
^M
   1       NAS 1/2 PT 483-87 SELECTPRIME A                   15.60    15.60 T
^M
   1       NAS QUART 441-21  MED FB REDUCE                   11.44    11.44 T
^M
   1       NAS 1/2 PT 483-11 FUL-CRYL CAT.                   33.86    33.86 T
^M
   2       PS  MC32          QT MIX CUP                       0.50     1.00 T
^M
   1       DUP M64           DUP. TAC RAG                     1.13     1.13 T
^M
   2       MMM 7447          SCOTCHBRITE                      1.25     2.50 T
^M
   1       PS  UG18          3/4" MASKING                     1.43     1.43 T
^M
   1       PS  PINT CANS     PAINT SUPPLYS
^M
   1       BAT 10U1R         MOWER BATTERY                   43.78    43.78
^M
   1       GAT 6984          FRACT.HP BELT           63.08   37.77    37.77
^M
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
^M
   1       WIX 42299         WIX AIR FILT.           15.26    9.87     9.87
^M
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
^M
   1       WIX 51521         WIX OIL FILT.           14.50    8.91     8.91
^M
^M
^M
                                                                     279.26 ^M
                                                                            ^M
                                                                       6.74 ^M
                                                                            ^M
                                                                            ^M
 

Hello,

Not sure if this will help.

sed 's/\^M//g;s/\^L//g;s/\^\[\[66t//g;s/\^\[\[51t//g' file_name

If these are not meta chars.

Output is as follows.

       ABC COMPANY                             ABC SHIPPING COMPAN
Y
       123 BAKER STREET                        123 DOCK STREET                 ^
M
       ANYTOWN USA 99999                       ANYTOWN USA 08277               ^
M
                                                                               ^
M
                                                                        PAGE   1

  155 498  MS                          07:43 AM     J  08/01/13 01/23/14*C428230

   1       PLA T237          WHITE PRIMER            10.13    6.58     6.58 T
   1       PS  STRAINERS     PAINT SUPPLYS
   1       SEM 39683         GRAY SELF ETC                   20.57    20.57
   1       SEM 39693         ETCH PRIMER                     20.57    20.57
   1       NAS QUART 421-19  SELECTPRIME 2                   22.75    22.75 T
   1       NAS 1/2 PT 483-87 SELECTPRIME A                   15.60    15.60 T
   1       NAS QUART 441-21  MED FB REDUCE                   11.44    11.44 T
   1       NAS 1/2 PT 483-11 FUL-CRYL CAT.                   33.86    33.86 T
   2       PS  MC32          QT MIX CUP                       0.50     1.00 T
   1       DUP M64           DUP. TAC RAG                     1.13     1.13 T
   2       MMM 7447          SCOTCHBRITE                      1.25     2.50 T
   1       PS  UG18          3/4" MASKING                     1.43     1.43 T
   1       PS  PINT CANS     PAINT SUPPLYS
   1       BAT 10U1R         MOWER BATTERY                   43.78    43.78
   1       GAT 6984          FRACT.HP BELT           63.08   37.77    37.77
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
   1       WIX 42299         WIX AIR FILT.           15.26    9.87     9.87
   1       GAT A100          HEA.DUTY BELT           34.65   20.75    20.75
   1       WIX 51521         WIX OIL FILT.           14.50    8.91     8.91
 
                                                                     279.26
                                                                       6.74

                                                                     286.00
 

Thanks,
R. Singh

When I ran the statement as you created it I still lost the last line. I did some more testing and no matter what with the SED command the last line was dropped so I am thinking I have meta characters on that last line causing the issue. I decided to do a test using awk and it read the entire file and wrote out the entire file something sed was not doing so what I need to know how to do is use AWK to replace the last form feed on the last line. Not sure of the syntax, something like this

awk '{sub(/$/,"\r");print}'

Thanks for the help

The problem comes from discovering what the 'last line' is. sed et al can't look back in time to see that. awk can read the file twice however, once to determine, once to substitute:

awk 'NR==FNR { LL=NR ; next }; FNR == LL { sub(/\014/, "\r"); }' inputfile inputfile > outputfile

\014 is the octal for a line feed, \r is its substitute. Use whatever you want for it.

Before trying to speculate whether or not you have any particular format why not try
a hexdump and have a deeper look inside the file so that you know exactly what you
are up against.

hexdump -C /full/path/to/filename

Once we know exactly what we are looking at then we can give informed replies.
Hope this helps...

So if I wanted to replace the FF with LF would this be the correct format? Also did you mean to have the input file twice in your example? I tried it both ways with input file twice and once and my result file is blank

awk 'NR==FNR { LL=NR ; next }; FNR == LL { sub(/\014/,/\012 ); }' inv >inv.test

Thanks

---------- Post updated at 04:22 PM ---------- Previous update was at 04:21 PM ----------

wisecracker, I did that when I first started working on this but didn't include my hex dump, here it is

0000    0d 0d 20 20 20 20 20 20  20 41 42 43 20 43 4f 4d   ..       ABC COM
0010    50 41 4e 59 20 20 20 20  20 20 20 20 20 20 20 20   PANY
0020    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
0030    20 41 42 43 20 53 48 49  50 50 49 4e 47 20 43 4f    ABC SHIPPING CO
0040    4d 50 41 4e 59 20 20 20  20 20 20 20 20 20 20 20   MPANY
0050    20 0d 0a 20 20 20 20 20  20 20 31 32 33 20 42 41    ..       123 BA
0060    4b 45 52 20 53 54 52 45  45 54 20 20 20 20 20 20   KER STREET
0070    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
0080    20 20 31 32 33 20 44 4f  43 4b 20 53 54 52 45 45     123 DOCK STREE
0090    54 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20   T
00a0    20 20 0d 0a 20 20 20 20  20 20 20 41 4e 59 54 4f     ..       ANYTO
00b0    57 4e 20 55 53 41 20 39  39 39 39 39 20 20 20 20   WN USA 99999
00c0    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
00d0    20 20 20 41 4e 59 54 4f  57 4e 20 55 53 41 20 30      ANYTOWN USA 0
00e0    38 32 37 37 20 20 20 20  20 20 20 20 20 20 20 20   8277
00f0    20 20 20 0d 0a 20 20 20  20 20 20 20 20 20 20 20      ..
0100    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0140    20 20 20 20 0d 0a 0d 0a  20 20 20 20 20 20 20 20       ....
0150    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0190    50 41 47 45 20 20 20 31  0d 0a 0d 0a 20 20 31 35   PAGE   1....  15
01a0    35 20 34 39 39 20 20 4c  53 20 20 20 20 20 20 20   5 499  LS
"inv.hd" 60 lines, 3743 characters
0190    50 41 47 45 20 20 20 31  0d 0a 0d 0a 20 20 31 35   PAGE   1....  15
01a0    35 20 34 39 39 20 20 4c  53 20 20 20 20 20 20 20   5 499  LS
01b0    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
01c0    20 20 20 30 39 3a 31 39  20 41 4d 20 20 20 20 20      09:19 AM
01d0    4a 20 20 30 31 2f 32 37  2f 31 34 20 30 31 2f 32   J  01/27/14 01/2
01e0    37 2f 31 34 2a 43 34 32  38 32 33 31 0d 0a 0d 0a   7/14*C428231....
01f0    20 20 20 31 20 20 20 20  20 20 20 57 49 58 20 35      1       WIX 5
0200    31 35 31 35 20 20 20 20  20 20 20 20 20 57 49 58   1515         WIX
0210    20 4f 49 4c 20 46 49 4c  54 2e 20 20 20 20 20 20    OIL FILT.
0220    20 20 20 20 20 31 30 2e  34 34 20 20 20 20 36 2e        10.44    6.
0230    34 31 20 20 20 20 20 36  2e 34 31 20 20 20 20 20   41     6.41
0240    0d 0a 20 20 20 31 20 20  20 20 20 20 20 57 49 58   ..   1       WIX
0250    20 35 31 35 31 35 52 20  20 20 20 20 20 20 20 57    51515R        W
0260    49 58 20 52 41 43 49 4e  47 20 46 49 20 20 20 20   IX RACING FI
0270    20 20 20 20 20 20 20 31  37 2e 33 38 20 20 20 31          17.38   1
0280    30 2e 36 38 20 20 20 20  31 30 2e 36 38 20 54 20   0.68    10.68 T
0290    20 20 0d 0a 0d 0a 0d 0a  0d 0a 0d 0a 0d 0a 0d 0a     ..............
02a0    0d 0a 0d 0a 0d 0a 0d 0a  0d 0a 0d 0a 0d 0a 0d 0a   ................
02b0    0d 0a 0d 0a 0d 0a 0d 0a  0d 0a 20 20 20 20 20 20   ..........
02c0    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0300    31 37 2e 30 39 20 0d 0a  20 20 20 20 20 20 20 20   17.09 ..
0310    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
 0300    31 37 2e 30 39 20 0d 0a  20 20 20 20 20 20 20 20   17.09 ..
0310    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0350    20 20 20 20 0d 0a 20 20  20 20 20 20 20 20 20 20       ..
0360    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0390    20 20 20 20 20 20 20 20  20 20 20 20 20 30 2e 37                0.7
03a0    35 20 0d 0a 20 20 20 20  20 20 20 20 20 20 20 20   5 ..
03b0    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
03f0    0d 0a 20 20 20 20 20 20  20 20 20 20 20 20 20 20   ..
0400    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0430    20 20 20 20 20 20 20 20  20 20 20 20 20 20 0d 0a                 ..
0440    20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20
*
0480    20 20 20 20 20 20 31 37  2e 38 34 20 0d 0c 0d            17.84 ...
048f

Pardon me, missing one crucial character from my code. Also, the output string should be "\012", since it's not a regular expression like the input is.

Actually, just "\n" will do.

And you forgot to list the filename twice. I even explained why it has to read the file twice -- it needs to find out what the last line is.

awk 'NR==FNR { LL=NR ; next }; FNR == LL { sub(/\014/,"\n" ); } 1' inv inv >inv.test
2 Likes

Corona688, that worked! Thanks....

This is true but looking back in time is not needed - as is reading a file twice.

The logic to apply in sed is: starting with a form-feed character read lines until you encounter another form-feed (in this case dump the lines already read into output and start over) or the last line. In case of the last line replace the first character in the pattern space (the FF) with a newline and dump the result to output. All other lines are simply added to the pattern space ("N") until one of the two conditions above is reached.

The script for this looks like this (<FF> is a form-feed character and supposed to be on a line of its own, that is: preceeded by a newline):

sed -n '$ {
               s/^./\n/p
          }
        /<FF>/ {
               s/.$//p
               s/^.*$/<FF>/
          }
          N' /path/to/file

I hope this helps.

bakunin

Ah, but it does look back -- it needs to wait until the entire file's already read to decide whether to substitute or not. It has a buffer, which either gets printed or not depending. That kind of logic -- "read until x, buffer until y, print" has a lot of special cases(what if x happens without y?) which makes the code larger and more bugprone, I found it more straightforward to read the file twice since it appeared to be small. It was a design choice.

Hi Corona how about using tac

$ tac file | awk 'FNR==1{sub(/\014/,"\n" )}1' | tac 

---------- Post updated at 09:42 AM ---------- Previous update was at 09:28 AM ----------

Untested version

$ awk 'function last(){sub(/\014/,"\n" );return $0}{p=$0; print getline == 0 ? last() : p RS $0}' file