Identify records having junk characters in unix

Hi Friends,

I need to have a command in Unix which output all teh records havingg junk characters in a file....

I know a command cat -tv <Filename> which opens the file and we can check for any junk character in it.

But my requirement is to fetch ONLY THOSE records having junk characters.
Please suggest

Thanks in advance,
Suresh.

What do you mean by junk characters ?

characters within specific ascii range ???

#!/bin/sh

while read N
do
        if hasjunk "$N"
        then
             echo "$N"
        fi
done

Hi ,

Junk characters means somethin like this when I did a cat on the unix file

|ש���ת� ע�� �ר�צ� ��שר�ת ���� ��� ���ר�� �- ש� � �ש������� ���ע� ���פ� ��ק� �� ���ר ���ש���������� ��� � �סר��� �ש� � ������� ש�� �� � �ק����. ��קש� �ס�ר����� ש� � ש�"�: 482304481-�ש��� ש� 3 �����ת ש� �סר� רק ש� ����� ש� �סר ��ע� ת� ���ר ���ש���ק��� ת�� 6 ���� ��� ע�ר� ���ש����ס�ר� �ס�פ�ת. �ע�רת� �ת��� �ת �ת �ע�� ���. ���ת|

Thanks and Regards,
Suresh

� - ascii value - 169

this link should be useful to you,

Unicode/UTF-8-character table - starting from code position 0080

something like this should do it,

#! /opt/third-party/bin/perl

open(FILE, "<", $ARGV[0]) || die ("unable to open <$!>\n");

while( read(FILE, $data, 1) == 1 ) {
  $ordVal = ord($data);
  if( $ordVal == 169 ) {
    # similarly for other characters as well,
    # better option would be to build a range for that
    # do the processing here
  }
}

close(FILE);

exit 0

Maybe the file command can help You?
Otherwise You must be more specific, You may be using character sets that come out strange in terminal but ok in any other application.

Example:
file *|grep text
in a random directory it would give me something like
ecl: ASCII text
gitt: Bourne-Again shell script text executable
HELP: ASCII English text
t2s: POSIX shell script text executable
time2Long.java: ASCII Java program text

(and lines sorted out could be lines like
Firefox_wallpaper.png: PNG image data, 1914 x 818, 8-bit/color RGB, non-interlaced
FW6AK115310.pdf: PDF document, version 1.3
itinerary-hotel-3S69Q2.RTF: Rich Text Format data, version 1, ANSI
)
Please be more specific if You can.

/Lakris

Use a language which has the isctrl() function; I think Perl does.