Issue with Sed command

I need to search for a keyword UTF-16 in a list of files if that keyword is found then i need to convert the file to UTF-8 format using iconv command.
After this i should substitute the UTF-16 keyword inside the file to UTF-8.
Please suggest how to do this in shell scripting.

 
for i in `find . -type f | xagrs grep -l "UTF\-16"`
do
          # Do your iconv command
          # Do the sed command here
done
1 Like

But with the use of Grep command i am not able to locate the keyword.
Please see the code i used according to your suggestion. Inside the directory i am searching i have files with txt,xml,edi,sgml extension files.

for i in `find . -type f | xagrs grep -l "UTF\-16"`
do
iconv -f UTF-16 -t UTF-8 $i > $i.old
sed 's/UTF-16/UTF-8/g' $i.old > $i
rm -f $i.old
done

I am getting the error xargs not found.

so please suggest something with the help of SED command if i can search for a keyword if it matches then proceed with conversion.

use $i instead of $fl

just do the below command on the file and make sure it captures something

 
grep "UTF\-16" filename
for i in `find . -type f | xagrs grep -l "UTF\-16"`
do
iconv -f UTF-16 -t UTF-8 $i > $i.old
sed 's/UTF-16/UTF-8/g' $i.old > $i
rm -f $i.old
done

You may want to check the command you typed a bit closer. This may just have been a typo here, but if you made that typo on your attempt it would have said, "xagrs not found".

2 Likes

Thanks for the reply.
But grep "UTF\-16" filename is not fetching result in case of xml extension files it is working fine for text file.
Can you please help me in this.

please post the sample content of the xml file (which has the UTF line)

1 Like

The xml file contains,
<?xml version="1.0" encoding="UTF-16"?>

here the code should search for this UTF-16 keyword and if found then need to convert the encoding to UTF-8 and then i want to replace this UTF-16 keyword to UTF-8.
Please help.

grep command is working for me

 
 
$ grep "UTF\-16" test.xml
<?xml version="1.0" encoding="UTF-16"?>

Did you change the agrs to args ? ( the typo in the last post )

 
for i in `find . -type f | xargs grep -l "UTF\-16"`
do
iconv -f UTF-16 -t UTF-8 $i > $i.old
sed 's/UTF-16/UTF-8/g' $i.old > $i
rm -f $i.old
done

Hi,

but grep "UTF\-16" filename
is not fetching any result for me if i try on the xml extension files.

If i try with sed -n '/UTF-16/p' filename it is working for individual file but for a number of files this command is not working.

you can use

are you using solaris ?

 
$ /usr/xpg4/bin/grep "UTF.16" test.xml
<?xml version="1.0" encoding="UTF-16"?>

Perhaps the backslash before the dash is the problem. Since the dash is not in any way special outside of a bracket range expression, technically that's an undefined sequence. Posix basic and extended regular expression implementations are allowed to behave differently when such a sequence is encountered. The implementation may throw a syntax error, discard the backslash and match the dash, match a literal backslash followed by a dash, or ask your cpu to self-destruct. It's best not to escape characters which are not special.

In the posix extended regular expression flavor, there are no ordinary characters which may be portably preceded by a backslash. In the posix basic regular expression flavor, there are only a few exceptions and dash outside of a bracket expression is not one of them: (, ), {, }, digits 1 through 9, and anything inside a bracketed expression.

For more info, see section 9.3.2 and 9.4.2 @ Regular Expressions

Regards,
Alister