kkb
December 7, 2009, 3:33pm
1
Hi All,
i am trying to remove all special charecters().,/\~!@#%^$*&^_- and others from a tab delimited file.
I am using the following code.
while read LINE
do
echo $LINE | tr -d '=;:`"<>,./?!@#$%^&(){}[]'|tr -d "-"|tr -d "'" | tr -d "_"
done < trial.txt > output.txt
Problem
1.The output file is not tab delimited. space is reduced. how do i correct it.
2.when I include ( -, _) in the first pattern it gives me error/weird results ( can you explain why.( let me know if you need output)
Thanks you
pludi
December 7, 2009, 3:57pm
3
If I interpret you correctly, you want to remove all characters except A-Z (any case) and 0-9, and preserve any whitespaces, right? If so:
perl -pe 's/[^A-Za-z0-9\s]//g' trial.txt > output.txt
What about
tr -cd '[:alnum:]\ \t' < file > newfile
kkb
December 7, 2009, 6:24pm
6
only one problem we need to use a newline character some where so that the output is in the same form as input. I was unable to fix it
What about
tr -d '[:punct:]' < trial.txt > output.txt
Where does that differ from your desired result?
kkb
December 7, 2009, 6:46pm
8
I was about to reply you.
I tried your code, looked like the spaces are still being truncated. I wanted to reconfirm that before I replied to you.
May be I should use the variable in quotes as suggested by fans. I will check that and reply back
I am not sure I understand. You do not need the while read loop anymore, just the tr statement. If you do that, what gets truncated?
danmero
December 7, 2009, 7:38pm
10
Right, try this one:
tr -cd '\ \t\n[:alnum:]' < infile > outfile
kkb
December 7, 2009, 10:14pm
11
pludi:
If I interpret you correctly, you want to remove all characters except A-Z (any case) and 0-9, and preserve any whitespaces, right? If so:
perl -pe 's/[^A-Za-z0-9\s]//g' trial.txt > output.txt
this works great. Thank you.
just curious : I tried to used the same expression with SED.
Is there any way to exclude tabbed spaces in the regular expression. ?
---------- Post updated at 10:08 PM ---------- Previous update was at 10:06 PM ----------
This works perfect. thank you very much. This forum is just awesome. I learn t a lot by just posting one question
---------- Post updated at 10:14 PM ---------- Previous update was at 10:08 PM ----------
Sorry. I used it wrong. It works perfectly Thank you very much.