Help with regular expressions

pxalpine · August 9, 2010, 8:22pm

I have a file that I'm trying to find all the cases of phone number extensions and deleting them. So input file looks like:

abc
x93825
def
13234
x52673
hello

output looks like:
abc
def
13234
hello

Basically delete lines that have 5 numbers following "x". I tried: x\(4)[0-9] but it doesn't seem to work. Can any regex experts help? thx.

kurumi · August 9, 2010, 8:34pm

sed -n '/^x[0-9][0-9][0-9][0-9][0-9]/!p' file

daPeach · August 9, 2010, 8:41pm

bash:

while read line; do [[ $line =~ ^x[0-9]{5}$ ]] || echo "$line"; done <<<"abc
x93825
def
13234
x52673
hello"
abc
def
13234
hello

sed:

sed -r '/^x[0-9]{5}$/d' <<<"abc
x93825
def
13234
x52673
hello"
abc
def
13234
hello

pxalpine · August 9, 2010, 8:55pm

kurumi's code worked, thank you. So I didn't try daPeach's.

Is there a way to find lines that have more than 3 capitalized letters in them?

daPeach · August 9, 2010, 9:22pm

bash:

while read line; do precount="${line//[[:lower:] ]}"; (( ${#precount} > 3 )) && echo "$line"; done <<<"a B c D e F g H
aBcDeFgH
a B c D e F g h
aBcDeFgh"
a B c D e F g H
aBcDeFgH

agama · August 9, 2010, 11:30pm

Sed one liner to print lines with three or more capital letters:

sed  -r -n  '/([A-Z][^A-Z]*){3,}/p'

Replace -r with -E if you're using a BSD system, or AST's (AT&T) sed.

pxalpine · August 10, 2010, 1:05pm

Thanks agama. Just to learn, what does the /p at the end of the code do?

Scott · August 10, 2010, 1:07pm

Hi.

it means print whatever expression is matched (in this case). You need it because you're using the -n option (which means don't print anything). Without it, you wouldn't get any output.