system
October 30, 2012, 4:35pm
1
Hi Unix Guru,
I have an requirement for replace some specail characters in a file, my file came from mainframe.
please see below example:
when open it with vi
17896660|89059215|04/24/1998 00:00:00.000000| abc 123-453-1312^M<85>^M<85>|124557
if I run cat -v I got following:
17896660|89059215|04/24/1998 00:00:00.000000| abc 123-453-1312^MM-^E^MM-^E|124557
there are more than 10000 records
my question is how can I remove these special characters.
Thanks in advance
:wall:
Jotne
October 30, 2012, 4:45pm
2
Are they at a fixed location in the line?
Eks:after number, but before |
system
October 30, 2012, 4:47pm
3
No, they are in fixed field but the field's content changed
Yoda
October 30, 2012, 9:46pm
4
Try using the below command on your file and redirect the output to a new file and check if the special characters are removed:-
sed 's/'"$(printf '\015')"'//g' input_file > output_file
system
October 30, 2012, 10:13pm
5
thanks for you reply, I tried it. it removed one ^M. the file looks as following:
17896660|89059215|04/24/1998 00:00:00.000000| abc 100-453-1312M-^EM-^E|124557
using vi
17896660|89059215|04/24/1998 00:00:00.000000| abc 100-453-1312<85><85>|124557
anyidea?
Thanks in advance
Yoda
October 30, 2012, 10:20pm
6
So using vi you are not seeing ^M characters anymore, is that correct? Also do you want to remove < & > symbols ?
system
October 30, 2012, 10:28pm
7
Yes, I need to remove <85><85> because this is a invisable character.
Try this:
awk -F\| '{gsub(/[^0-9]*$/,"",$4); print}' OFS=\| infile > outfile
system
October 30, 2012, 10:35pm
9
No, this one doesn't work. the result is exactly same as input
Umm you could try this:
sed -e "s/$(echo -e '\0205')//g" -e "s/\r//g" infile > outfile
system
October 31, 2012, 10:02am
11
thank you very much. it works
---------- Post updated 10-31-12 at 10:02 AM ---------- Previous update was 10-30-12 at 10:54 PM ----------
One more question, how can I find ascii characters of these two octal.
Thanks in advance
vbe
October 31, 2012, 11:08am
12
You did say your file came from a mainframe, but did not say which... It has its importance if the mainframe did not convert the file to ascii before transfer... (common...) e.g. a file comming from MVS would need a :
dd if = <infile> of = <outfile> conv= ascii
look at the man pages of dd ...
system
October 31, 2012, 11:34am
13
vbe:
You did say your file came from a mainframe, but did not say which... It has its importance if the mainframe did not convert the file to ascii before transfer... (common...) e.g. a file comming from MVS would need a :
dd if = <infile> of = <outfile> conv= ascii
look at the man pages of dd ...
my server is Linux x86_64 x86_64 x86_64 GNU/Linux
thanks for your repliy.
when I run the code, I got following error:
dd: unrecognized operand `if'
Try `dd --help' for more information.
Thanks in advance
vbe
October 31, 2012, 11:39am
14
sorry there are no spaces after if:
dd if=<infile> of=<outfile> conv=ascii
but that, you would have noticed if you had looked at the man pages of dd...
ctsgnb
October 31, 2012, 12:13pm
15
$ seq -f "%03g" 0 400 | while read i; do echo -e "code $i : \0$i"; done | cat -v | grep 'M-^E'
code 205 : M-^E
Looking especially for those having long sequences and filtering out those that are ending with figures:
$ seq -f "%03g" 0 400 | while read i; do echo -e "code $i : \0$i"; done | cat -v | awk 'length($0)>12&&$0!~/.*[0-9]+$/' | pr -5 -s' ' -t
code 000 : ^@ code 177 : ^? code 235 : M-^] code 305 : M-E code 343 : M-c
code 001 : ^A code 200 : M-^@ code 236 : M-^^ code 306 : M-F code 344 : M-d
code 002 : ^B code 201 : M-^A code 237 : M-^_ code 307 : M-G code 345 : M-e
code 003 : ^C code 202 : M-^B code 240 : M- code 310 : M-H code 346 : M-f
code 004 : ^D code 203 : M-^C code 241 : M-! code 311 : M-I code 347 : M-g
code 005 : ^E code 204 : M-^D code 242 : M-" code 312 : M-J code 350 : M-h
code 006 : ^F code 205 : M-^E code 243 : M-# code 313 : M-K code 351 : M-i
code 007 : ^G code 206 : M-^F code 244 : M-$ code 314 : M-L code 352 : M-j
code 010 : ^H code 207 : M-^G code 245 : M-% code 315 : M-M code 353 : M-k
code 013 : ^K code 210 : M-^H code 246 : M-& code 316 : M-N code 354 : M-l
code 014 : ^L code 211 : M-^I code 247 : M-' code 317 : M-O code 355 : M-m
code 015 : ^M code 212 : M-^J code 250 : M-( code 320 : M-P code 356 : M-n
code 016 : ^N code 213 : M-^K code 251 : M-) code 321 : M-Q code 357 : M-o
code 017 : ^O code 214 : M-^L code 252 : M-* code 322 : M-R code 360 : M-p
code 020 : ^P code 215 : M-^M code 253 : M-+ code 323 : M-S code 361 : M-q
code 021 : ^Q code 216 : M-^N code 254 : M-, code 324 : M-T code 362 : M-r
code 022 : ^R code 217 : M-^O code 255 : M-- code 325 : M-U code 363 : M-s
code 023 : ^S code 220 : M-^P code 256 : M-. code 326 : M-V code 364 : M-t
code 024 : ^T code 221 : M-^Q code 257 : M-/ code 327 : M-W code 365 : M-u
code 025 : ^U code 222 : M-^R code 272 : M-: code 330 : M-X code 366 : M-v
code 026 : ^V code 223 : M-^S code 273 : M-; code 331 : M-Y code 367 : M-w
code 027 : ^W code 224 : M-^T code 274 : M-< code 332 : M-Z code 370 : M-x
code 030 : ^X code 225 : M-^U code 275 : M-= code 333 : M-[ code 371 : M-y
code 031 : ^Y code 226 : M-^V code 276 : M-> code 334 : M-\ code 372 : M-z
code 032 : ^Z code 227 : M-^W code 277 : M-? code 335 : M-] code 373 : M-{
code 033 : ^[ code 230 : M-^X code 300 : M-@ code 336 : M-^ code 374 : M-|
code 034 : ^\ code 231 : M-^Y code 301 : M-A code 337 : M-_ code 375 : M-}
code 035 : ^] code 232 : M-^Z code 302 : M-B code 340 : M-` code 376 : M-~
code 036 : ^^ code 233 : M-^[ code 303 : M-C code 341 : M-a code 377 : M-^?
code 037 : ^_ code 234 : M-^\ code 304 : M-D code 342 : M-b code 400 : ^@
system
October 31, 2012, 12:48pm
16
Thanks for your reply. one thing I need to know how I can find the ascii character for code 205.
Thanks in advance
system
November 4, 2012, 12:37am
18
ctsgnb:
$ seq -f "%03g" 0 400 | while read i; do echo -e "code $i : \0$i"; done | cat -v | grep 'M-^E'
code 205 : M-^E
Looking especially for those having long sequences and filtering out those that are ending with figures:
$ seq -f "%03g" 0 400 | while read i; do echo -e "code $i : \0$i"; done | cat -v | awk 'length($0)>12&&$0!~/.*[0-9]+$/' | pr -5 -s' ' -t
code 000 : ^@ code 177 : ^? code 235 : M-^] code 305 : M-E code 343 : M-c
code 001 : ^A code 200 : M-^@ code 236 : M-^^ code 306 : M-F code 344 : M-d
code 002 : ^B code 201 : M-^A code 237 : M-^_ code 307 : M-G code 345 : M-e
code 003 : ^C code 202 : M-^B code 240 : M- code 310 : M-H code 346 : M-f
code 004 : ^D code 203 : M-^C code 241 : M-! code 311 : M-I code 347 : M-g
code 005 : ^E code 204 : M-^D code 242 : M-" code 312 : M-J code 350 : M-h
code 006 : ^F code 205 : M-^E code 243 : M-# code 313 : M-K code 351 : M-i
code 007 : ^G code 206 : M-^F code 244 : M-$ code 314 : M-L code 352 : M-j
code 010 : ^H code 207 : M-^G code 245 : M-% code 315 : M-M code 353 : M-k
code 013 : ^K code 210 : M-^H code 246 : M-& code 316 : M-N code 354 : M-l
code 014 : ^L code 211 : M-^I code 247 : M-' code 317 : M-O code 355 : M-m
code 015 : ^M code 212 : M-^J code 250 : M-( code 320 : M-P code 356 : M-n
code 016 : ^N code 213 : M-^K code 251 : M-) code 321 : M-Q code 357 : M-o
code 017 : ^O code 214 : M-^L code 252 : M-* code 322 : M-R code 360 : M-p
code 020 : ^P code 215 : M-^M code 253 : M-+ code 323 : M-S code 361 : M-q
code 021 : ^Q code 216 : M-^N code 254 : M-, code 324 : M-T code 362 : M-r
code 022 : ^R code 217 : M-^O code 255 : M-- code 325 : M-U code 363 : M-s
code 023 : ^S code 220 : M-^P code 256 : M-. code 326 : M-V code 364 : M-t
code 024 : ^T code 221 : M-^Q code 257 : M-/ code 327 : M-W code 365 : M-u
code 025 : ^U code 222 : M-^R code 272 : M-: code 330 : M-X code 366 : M-v
code 026 : ^V code 223 : M-^S code 273 : M-; code 331 : M-Y code 367 : M-w
code 027 : ^W code 224 : M-^T code 274 : M-< code 332 : M-Z code 370 : M-x
code 030 : ^X code 225 : M-^U code 275 : M-= code 333 : M-[ code 371 : M-y
code 031 : ^Y code 226 : M-^V code 276 : M-> code 334 : M-\ code 372 : M-z
code 032 : ^Z code 227 : M-^W code 277 : M-? code 335 : M-] code 373 : M-{
code 033 : ^[ code 230 : M-^X code 300 : M-@ code 336 : M-^ code 374 : M-|
code 034 : ^\ code 231 : M-^Y code 301 : M-A code 337 : M-_ code 375 : M-}
code 035 : ^] code 232 : M-^Z code 302 : M-B code 340 : M-` code 376 : M-~
code 036 : ^^ code 233 : M-^[ code 303 : M-C code 341 : M-a code 377 : M-^?
code 037 : ^_ code 234 : M-^\ code 304 : M-D code 342 : M-b code 400 : ^@
Hi, ctsgnb,
I put filter condition chr(13), it removed ^M, but M-^E is still there , I put filter condition chr(133) (I found a char it shows octal code 205 match ascii code 133), but for somehow it doesn't work. do you have any idea which value or what kind of charactor I need to put in to filter out M-^E. I need remove these charactor in another process before generating the file.
Thanks in advance
:wall:
ctsgnb
November 4, 2012, 4:35am
19
did you try :
tr -d '\205' <yourfile >yourfile.fixed
?
Did you try strings infile
?