Identify high values "ÿ" in a text file using Unix command

I have high values (such as ����) in a text file contained in an Unix AIX server. I need to identify all the records
which are having these high values and also get the position/column number in the record structure if possible. Is there
any Unix command by which this can be done to :

  1. Get the number of occurrences of high values in the file
  2. Get the position/column of it in the record structure (optional)

I tried the option of echo "�" but it is not able to detect them. The ascii equivalent for "�" is 255 and I have tried
searching for the option of trying to grep using the ascii value but got no results.

Please let me know if there is any way to achieve this?

Thanks!

 
can you try something like this:
 
cat > input_file
abcZ]fdsGGG
 
printf `cat input_file ` | od -An -to1
141 142 143 132 135 146 144 163 107 107 107
 
Now , based on ascii value , you have to search for the special character and then you can get that position as well.
 

Hi Panyam,

We have huge files to work upon and looking for a faster and better way other than converting each character to its ascii value.Also, the record layout changes when we try your command which would give incorrect position. To mention all the files will consist of fixed length records

Is there a way where we can run grep using the ascii value of this special character directly? Is there any better way to do this?We will just need to locate this special character and find the number of occurrences,number of records its impacting and the fields impacted based on the position.Any help on this is highly appreciated.

Thanks for replying.

What Operating System and version are you running?
What Shell do you use?
Do you have a high-level language such as Oracle available or are you trying to do this with Shell tools and unix commands.

How big are the files?
How long is each record?

Ir you looking for characters 128-255 inclusive, or just character 255, or something else?

What are you going to do with the results? Are you going to try to change characters?

Methyl,
Below are the answers.
What Operating System and version are you running?AIX 5.3
What Shell do you use?Korn Shell
Do you have a high-level language such as Oracle available or are you trying to do this with Shell tools and unix commands.-Oracle not readily available..trying with shell and unix commands
How big are the files?-Files fall in the range of 20MB-10GB
How long is each record?Record length fall in the range of 1-1000 and some even above.
Ir you looking for characters 128-255 inclusive, or just character 255, or something else?Just the 255 character i.e. small y with diaeresis as mentioned earlier.

What are you going to do with the results? Are you going to try to change characters? I will not be changing/replacing this character.I just to have the total number of occurrences in the file,number of records having this characters,number of occurrences per record and position of the character to identify which field is impacted.These results are need for analysis.

something to start with - assuming your LOCALE is set correctly:

nawk 'BEGIN{y=sprintf("%c", 0255)}$0 ~ y{n+=gsub(y,"");r++}END{printf("totatlRecords->%d totalChars->%d\n",r,n)}' myFile