How to remove escape sequences from a text file?

Hello friends,

Could anyone please advise on how to remove escape sequences from a text file?

$ file input.txt
input.txt:       ASCII English text, with escape sequences

I'm able to see those escape characters when opened in vi editor like shown below:

^[^[&l0O^[(s16.66h^[&a10L^[&l64F^[&l1S^[&a1G
TEST^[&a           73CPrint Date 25/09/15^[&a          105CPage      1
TRAVEL

but not when I run more command. That's how I want the file to be (by removing the escape seq - we also need to remove entire line if it has only escape sequences)

TEST 			Print Date 25/09/15 	                 Page 1
TRAVEL

Please advise,

many thanks in advance!!

I currently have no file to test and not aware how to produce such but you can try this:

tr -d '[[:cntrl:]]' < infile > outfile
1 Like

Those are pcl5 escape sequences. Search pcl2txt.

---------- Post updated at 08:16 AM ---------- Previous update was at 07:25 AM ----------

If you want to write your own, the general rule is, delete all characters following an 'esc' (x027) until either a space or an upper case letter. The upper case letter should also be deleted, but not the space.

1 Like

Thanks zaxxon for the reply Unfortunately it did not work and resulted a file with single line ( vi shows 1 line) and wc -l shows 0.

Thanks jgt for your advice on pcl2txt but I couldn't get it for Linux. Could you please provide me a download link? Many thanks.

Have a look at GhostPCL or GhostPDL. Both are part of Ghostscript.

I don't think that there are any free versions.
Do you have the source code for the application that creates the file? Can you select a plain text printer? Can you run the report twice or does the report also update files?

 ^[^[&l0O^[(s16.66h^[&a10L^[&l64F^[&l1S^[&a1G TEST^[&a           73CPrint Date 25/09/15^[&a          105CPage      1 TRAVEL
 

Whoever wrote the original program, has used the PCL5 escape sequence to centre the heading on the page. And not only that, the number of columns to tab is right justified in a 12 character field. If you do manage to remove the entire sequence, the fields in the heading will be separated by one character.
ie the '^[&a 73C' sequence is the complete PCL5 command, and it means tab 73 columns. Once all the PCL5 is removed, the line will look like:

TESTPrint Date 29/05/15Page 1