How to remove degree symbol from the TXT files?

I have TXT files[2000] to process but they contain the degree symbols in them due to which the processing program fails on these files. I want a unix command that will remove the degree symbols from these files.

I tried using the sed command: sed 's/[!@#\$%^�&*()]//g' filename but it did not work. This issue requires immediate attention at my end, any help on this would be highly appreciated.

Khedu.

What did not work? The sed command looks good, but you need either:

  • to redirect the output produced by sed to a file, or
  • to change in place using the option '-i'. I recommend to make a backup of the files before doing this, just to play safe.

Cheers,
/Lew

The above used sed command doesn't replace the degree symbols from the text files even with -i option. Help required on this..

try this..

 
perl -i -pe 's/[^\u2103]//g' file

Hi,

if you unable to substitute, try to use \ before that symbol bcos those symbols have special meaning.

\ - refers take those symbols with out speacial meaning.

cheers,
Ranga:)

I tried /usr/bin/perl -i -pe 's/[^\u2103]//g' file.txt > newfile.txt -- It gave an empty newfile.txt and the content of file.txt got changed to the name of it.

sed -e 's/\�//g' -- did not replace the degree symbol either.

PS : The degree symbol is visible only when I select the Show/Hide button on MS Word other wise it doesn't appear. Files with this symbol are not getting processed.

Khedu.

can u try the below

 
perl -i -pe 's/\xB0//g' file
 
$ perl test.pl
Before remove : 23�34'N 12�25'W
After remove : 2334'N 1225'W

$ cat test.pl
#!/usr/bin/perl
my $data = "23�34'N 12�25'W";
print "Before remove : $data\n";
$data=~s/\xB0//g;
print "After remove : $data\n";

Try to find out what that degree symbol is - most likely it is not the ascii-character - with hexdump or a similar tool.

Do you have iconv? You can use that to strip out any non-ascii characters.

Assuming your input file is utf8:

iconv -c -f UTF8 -t ASCII file.txt > file.txt.out

If iconv is not available, given how utf-8 is structured, the following should give an identical result:

tr -d '\200-\377' <file.txt >file.txt.out

Regards,
Alister

1 Like

Hi Corona and Alister,

Your command -> iconv -c -f UTF8 -t ASCII source.txt > output.txt.out and tr -d '\200-\377' <file.txt >file.txt.out to remove degree symbol worked fine, I also want to know if we can replace that character with a space as the required space between the two words got removed at the position degree symbol got converted. Thanks a lot for all the help till now on this. Looking forward for your answer on this remaining piece as well. Thanks!

Khedu.

Replacing UTF8 is a lot harder than stripping it out. We don't even know how that character's represented in your data right now...

---------- Post updated at 01:14 PM ---------- Previous update was at 01:11 PM ----------

Hmm... working on alister's solution, here's a way:

tr -s '\200-\377' ' ' <file.txt >file.txt.out

It should replace any sequence of characters with the 8th bit set(UTF8 sequences), with a single space.

Oh perfect! Works fine..!

Corona Cheers!!

Khedu.

sed -e 's/[\!|\@|\#|\\|\$|\%|\^|\�|\&|\*|\(|\)]//g' se deben escapar los caracteres espaciales anteponiendo "\" .

In that bracketed expression, every single backslash intended as an escape sequence is absolutely unnecessary. Not only are the characters following the backslashes not special when in a bracketed expresion, neither is the backslash itself.

The repeated use of | is probably intended as an alternation operation, but since the pipe symbol is not special in this context, it is unintentionally added to the list of matched characters.

I recommend reading http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1\_chap09.html\#tag\_09_03 to learn how bracketed expressions work within basic regular expressions.

The short version: There's almost nothing special within bracketed expressions, except for ] , ^ (only if it occurs as the first character), and an embedded - (in the POSIX locale, used for range expressions).

Assuming I understood your intent (which is not a foregone conclusion), the following is a corrected version of what you suggested:

sed -e 's/[!@#\$%^�&*()]//g'

Regards,
Alister