junk characters in the begining of every line

Hi Experts,

here is a background to my problem :

I am exporting data from teradata using fastexport utility, as varchar data.
This pads additional two bytes (2 places as seen in notepad) in the resultset.
I have found out other means of avoiding it but can't use varchar option in that solution.
(all this without using shell script)

solution:
i used cut -c 3- <filename> to get rid of it & it's working fine.

but i feel it's not very convincing solution. here is a sample data :








Please suggest solution.

Why do you feel this is not a very convincing solution?

:slight_smile:

good question.

I feel so because , I would like to generalize this piece of code.

To handle the unwanted characters except for [0-9],[a-z],[A-Z],@,.,-

is there any way to find out only junk characters ?

Hi, the first two bytes appear to be bytes that have a meaning that is not representing ascii characters. However if you display them they are wrongfully interpreted as ascii characters. So I don't think you can use ascii values to determine which characters should or should not be printed. For example if the first character is a capital D you would then not remove it. But what if the line starts with DqHello , then these characters are clearly invalid although they are in the right character class. So I would tend to think that just removing the first two characters is the best option, no?

Those two characters appear to be channel control or carriage return control, like FORTRAN, MVS, or VMS can use. They probably represent the length of the line.

scrutinizer is quite correct. You could get ostensibly valid characters appearing there that are clearly garbage.

Ok.

Thanks a lot!