Selecting lines with sed

allinshell · May 20, 2010, 7:08am

Hi all,
I have a file with special characters like this

file1

691775025 ��qJ8^Z^Y{ 2004-08-23E P 100.00
45585025 0527541139295037342008-07-25OEP 100.00
6983025 �B<9D>x<^F^Xb 2004-11-16SPP 100.00

I need a sed command to print the lines which don't have special characters.ie., only line 2 should be printed from the file1

I was trying with the following command and it wasn't working.

sed -e '/^[-.0-9A-Za-z\s]$/p' file1

Can anybody help on this.

albertogarcia · May 20, 2010, 7:20am

try with:

egrep -v "\^" file1

pseudocoder · May 20, 2010, 7:56am

Woow! How/why does that work? Can you explain, please?

albertogarcia · May 20, 2010, 8:00am

I'll try to explain it (in english).

the -v options shows lines that doesn't match the expresion.

"\^" is the expresion, I have considered the ^ character as the only "special" character in your expresions...

Maybe the solution doesn't run for other lines...

pseudocoder · May 20, 2010, 8:19am

All right, now I got it. Shame on me that I did not find out that myself. I was staring at that code and was wondering how can be that it filters out all the mentioned special chars

allinshell · May 20, 2010, 10:34am

Hi,
In my case, anything other than the following should be considered as special char.

Alphanumeric
-
.
whitespace(can be anywhere in the line)

anbu23 · May 20, 2010, 10:46am

$ cat file
691775025 ��qJ8^Z^Y{ 2004-08-23E P 100.00
45585025 0527541139295037342008-07-25OEP 100.00
6983025 �B<9D>x<^F^Xb 2004-11-16SPP 100.00
$ sed -n '/^[-.0-9A-Za-z ]*$/p' file
45585025 0527541139295037342008-07-25OEP 100.00
$ sed '/[^-.0-9A-Za-z ]/d' file
45585025 0527541139295037342008-07-25OEP 100.00

alister · May 20, 2010, 11:56am

\s is not valid in posix-compliant posix (perhaps it's a gnu sed extension, but I'm not sure). Even if it's allowed, that regular expression would only match a line with one matching character. And, since sed prints all lines by default, the p command in this case will cause matching lines to print twice.

Two portable sed alternatives:

sed -n '/^[[:alnum:][:blank:].-]*$/p' file1
sed '/^[[:alnum:][:blank:].-]*$/!d' file1

The first disabled printing by default, with the -n option, and then only prints lines that consist of nothing but the allowed characters. The second option deletes all lines that do not consist of all matching characters.

Regards,
Alister

---------- Post updated at 11:56 AM ---------- Previous update was at 11:53 AM ----------

I've been having vision troubles the past few days. Seconds after posting, I noticed anbu's post. Mine is essentially the same, except it uses character classes.

allinshell · May 24, 2010, 10:36am

Thanks a lot anbu23 and Alister, it was very helpful. I have a doubt. how can i do this task for a particular part of the line..for example, i want to check for the special characters in all lines from position 10 to 20 only..

anbu23 · May 24, 2010, 12:29pm

$ cat file
691775025 ��qJ8^Z^Y{ 2004-08-23E P 100.00
45585025 0527541139295037342008-07-25OEP 100.00
6983025 �B<9D>x<^F^Xb 2004-11-16SPP 100.00
$ awk ' !gsub("[^-.0-9A-Za-z ]","",substr($0,10,10)) ' file
45585025 0527541139295037342008-07-25OEP 100.00