Find invalid character

HI Team,

I have script to find the invalid character in file.

f=�pallvi\mahajan�
                 n=0
 while (( $n <= ${#f} ));
 do
 c="${f:$n:1}"
 echo '$c'
 if [[ "$c" = *[^[:space:]*] ]];
 then 
 grep -sq $c valid.txt
 if  [ $? -eq  1 ];
 then
 echo "$f" >> f.txt
 break
 fi
 fi
 n=$((n+1))
 done

My valid.txt file is

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E  F G H I J K L M N O P Q R S T U V W X Y Z ( ) { }  \t 1 2 3 4 5 6 7 8 9 0 - / :  ? + \n . , '

But my script is not searching \ character which is string f='pallvi\mahajan'
Please let me know how i can do this.

I have run this script in debugging mode and it is running but this variable is not putting in file f.txt
while running script in debussgibg mode, i checked grep -sq is giving error when searching \
exit status is 2 in that time.

Your script is wrong in many ways.

f=�pallvi\mahajan�

This attempts to run the command pallvi\mahajan and stores its output, if any, in f. Since no such command exists, you get nothing. I believe you meant:

f='pallvi\mahajan'

Also, variables don't expand in single quotes. echo '$c' will print the literal text $c . I believe you meant echo "$c"

Also, some characters have a special meaning to grep. \ for example. To make grep take all characters you give it literally, tell it -F.

Also, you don't need to use $? to fit grep into an expression. Just use if ! grep ... then ...

Also, if you know your shell has [[ string == *str* ]], you could use it instead of grep. Or you could replace valid.txt and most of this program with one case, really.

The #1 mistake is the grep.

  1. quote "$c"
  2. a single \ is an RE error, must be \\
    Because one can prepend a \ to a normal character, you can do
grep -sq \\"$c" valid.txt

\\ is necessary for the shell; it will pass a single \ to grep.
You better use a non-RE grep

fgrep -sq "$c" valid.txt

thanks but this did not work for another character getting backrefrence error

grep -sq \9 valid.txt

while running script in debug mode, i found this error and also this are passing another character also backslash

Ah, even \9 is a special meaning in an RE. Not to speak of special characters.
Then take my 2nd suggestion, a non-RE grep!

This also not work fgrep

can you suggest me something

---------- Post updated at 03:49 PM ---------- Previous update was at 03:37 PM ----------

while runnig in script in debug mode, i find this using fgrep.

  echo '$c'
 $c
 + [[ \ = *[^[:space:]*] ]]
 + grep -sq '\' valid.txt
 last character is \
 + '[' 2 -eq 1 ']'
 + n=7
 + ((  7 <= 14  ))
 + c=m
 + echo '$c'
  

Moderator comments were removed during original forum migration.

HI,

I have one file in which client names are there, and i need to find client name which has invalid character.

So i make this script.

while read line
do 
              n=0  while (( $n <= ${#f} ));  do  c="${f:$n:1}"  echo '$c'  if [[ "$c" = *[^[:space:]*] ]];  then   grep -sq $c valid.txt  if  [ $? -eq  1 ];  then  echo "$f" >> f.txt  break  fi  fi  n=$((n+1))  done
done < client.txt

and my valid character are

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E  F G H I J K L M N O P Q R S T U V W X Y Z ( ) { }  \t 1 2 3 4 5 6 7 8 9 0 - / :  ? + \n . , '

But my script is not finding \ character and by using fgrep also it is giving error . I dont know what are invalid character but i have list of valid character which are mentioned above.

I have one client name in which \ slash are there and some how my script is not picking that one line.

Hope this clarify you and please help me how i can correct it.

Giving what error?

If you have bash shell can I suggest the following which doesn't use any external programs and should run a lot quicker:

#!/bin/bash
valid=( $'\t' $'\n' \\ \( \) \{ \} \' - / : ? + . , {a..z} {A..Z} {0..9} )
v="${valid[@]}"

while read line
do
    for((n=0;n<${#line};n++))
    do
       c=${line:$n:1}
       [[ "$c" = '\' || "$c" = '?' ]] && c=\\"$c"

       if [[ "${v/$c/}" = "${valid[@]}" ]]
       then
          echo "Invalid character: ${line:$n:1} found in $line"
       else
          echo "$c" >> f.txt
       fi
    done
done < client.txt

HI Chubler,

i have ran ur script by modifying little bit, but it is not giving any output. i have created dummy file which has

#!/bin/bash
valid=( $'\t' $'\n' \\ \( \) \{ \} \' - / : ? + . , {a..z} {A..Z} {0..9} )
v="${valid[@]}"

while read line
do
    for((n=0;n<${#line};n++))
    do
       c=${line:$n:1}
       [[ "$c" = '\' || "$c" = '?' ]] && c=\\"$c"

       if [[ "${v/$c/}" = "${valid[@]}" ]]
       then
          echo "Invalid character: ${line:$n:1} found in $line" >> f.txt
       fi
    done
done < client.txt

client.txt

cat client.txt
pallvi mahajan
lonodn #enterprise
kite \ pallvi
sunny sapen20\45\65

after script ran, in f.txt it should come all records except 1st one, but it only search London record

Invalid character: # found in lonodn #enterprise

it is not searches again \ record

This is because the \ character is in the valid list if you want \ as invalid change

this:

valid=( $'\t' $'\n' \\ \( \) \{ \} \' - / : ? + . , {a..z} {A..Z} {0..9} )

to this:

valid=( $'\t' $'\n' \( \) \{ \} \' - / : ? + . , {a..z} {A..Z} {0..9} )

Edit: Also to have read process backslash characters properly use -r option:

while read -r line

HI chubler,

i remove that character also, still same output.

---------- Post updated at 06:48 PM ---------- Previous update was at 06:47 PM ----------

and also please let us know what this command is doing, do i need to change any thing here also

[[ "$c" = '\' || "$c" = '?' ]] && c=\\"$c" 
        if [[ "${v/$c/}" = "${valid[@]}" ]]

See updated post: you will need the -r option of read to stop it interpreting the backslash as an escape character on input.

To explain the two lines:

[[ "$c" = '\' || "$c" = '?' ]] && c=\\"$c"

This code excapes a couple of characters that are specially treated by the Pattern substitution feature of bash. \ escapes the next character. ? is a wildcard representing any single character.

if [[ "${v/$c/}" = "${valid[@]}" ]]

This code tests if replacing your character with nothing changes the valid characters list. If these strings are the same it means the input character is not in the valid characters list.

sorry chubler, can you please explain in brief.

&& c=\\"$c"
"${v/$c/}"
&& c=\\"$c"

Interpretation: If the proceeding command returned true, then append backslash to the c variable

This line [[ "$c" = '\' || "$c" = '?' ]] && c=\\"$c"

could also be written as:

if [[ "$c" = '\' ]] || [[ "$c" = '?' ]]
then
    c='\'"$c"
fi

"${v/$c/}" replace 1st occurrence of $c within $v with nothing example:

$ v="the quick brown fox"
$ c=" quick"
$ echo "${v/$c/}"
the brown fox

thanks a lot, can you please help me where i can learn these such topic, i want to go through in detail,

i have ran script and looks like it is working, but i need to check with my manger tomorrow.

Can i ping you tomorrow , if i have any issue

You should always start with the bash man entry for example Pattern subsitution:

       ${parameter/pattern/string}
              Pattern substitution.  The pattern is expanded to produce a pat-
              tern  just  as in pathname expansion.  Parameter is expanded and
              the longest match of pattern against its value is replaced  with
              string.   If  pattern  begins with /, all matches of pattern are
              replaced  with  string.   Normally  only  the  first  match   is
              replaced.  If pattern begins with #, it must match at the begin-
              ning of the expanded value of parameter.  If pattern begins with
              %,  it must match at the end of the expanded value of parameter.
              If string is null, matches of pattern are deleted and the / fol-
              lowing pattern may be omitted.  If parameter is @ or *, the sub-
              stitution operation is applied to each positional  parameter  in
              turn,  and the expansion is the resultant list.  If parameter is
              an array variable subscripted with  @  or  *,  the  substitution
              operation  is  applied  to each member of the array in turn, and
              the expansion is the resultant list.

If you are unsure how something will work try a small example from the command prompt as I did above.

I'll be around tomorrow and will check this thread for any further questions.

Parameters Expansion Guide