Removing string between two particular strings in a line

ppatra · January 6, 2012, 3:39am

Hi,

I have a file with following format:

1|What is you name (full name)?|Character
2|How far [in kilometers] is your school [from home]?|Numeric

Now I need to remove everything inside brackets () or []. There can be more than one pair of brackets. The output file should look like:

1|What is you name?|Character
2|How far is your school?|Numeric

Can anybody please help?

I tried with

sed 's/\[*\]//g'

but not working.

balajesuri · January 6, 2012, 4:41am

perl -pe 's/ [(\[].+?[)\]]//g' inputfile

mirni · January 6, 2012, 4:50am

You were close. Sed's pattern matching is greedy though, so the regex

s/\[.*\]//g

will match everything from the first opening bracket to the last closing one. What you need is to replace the '.' (which means 'any character'), with 'not closing bracket', that is [^]]:

sed 's/\[[^]]*\]//g'

should do the trick

ppatra · January 6, 2012, 5:09am

Thanks a lot! you guys are great!
The perl one is working exactly what I needed.
But for sed I am becoming fool when trying to to do it for (). Following are not working:

sed 's/(^*)//g'
sed 's/\((^))*\)//g'

Do we need to handle () in a different way, I guess we don't need to negate () like we need to do for [].

mirni · January 6, 2012, 5:34am

The difference is that '[' and ']' are metacharacters in regexp.
[^a] means all characters but not 'a'. So literal '[' or ']' needs to be escaped as \[, resp. \].

With parens its actually simpler:

sed 's/([^)]*)//g'

Since you don't have to escape them. But it's the same logic -- you do use the [^)] to match all characters but not ')'. If you allowed ')' to be matched, it would greedily eat the whole thing between first ( and last ). Not what you want here.

ppatra · January 6, 2012, 6:52am

Awesome! Thank you!

ctsgnb · January 6, 2012, 10:05am

mirni:

You were close. Sed's pattern matching is greedy though, so the regex
s/\[.*\]//g
will match everything from the first opening bracket to the last closing one. What you need is to replace the '.' (which means 'any character'), with 'not closing bracket', that is [^]]:
sed 's/\[[^]]*\]//g'
should do the trick

In fact the last closing bracket does not need to be escaped, it will already be taken as litteral :

...  sed 's/\[[^]]*]//g'

---------- Post updated at 04:05 PM ---------- Previous update was at 03:56 PM ----------

The right expression to do it one shot is :

... sed 's/[[(][^])]*[])]//g'

$ echo 'this (is) a [test]!!!'
this (is) a [test]!!!
$ echo 'this (is) a [test]!!!' | sed 's/[[(][^])]*[])]//g'
this  a !!!

Note that this one shot notation will also match the [...) and (...] blocks :

$ echo 'this (is) a [darn) funny (and] crazy [test]!!!'
this (is) a [darn) funny (and] crazy [test]!!!
$ echo 'this (is) a [darn) funny (and] crazy [test]!!!' | sed 's/[[(][^])]*[])]//g'
this  a  funny  crazy !!!

ppatra · January 7, 2012, 4:28am

Thanks a ton!!! great!

Now I have one more query to you guys regarding how to use variables in sed cmmand. PLease find beow situation:

I have a input file as below:

this is line 1
line number 2
"now" it is three
line 4
note* this five

output should look like:

line 1
line 2
line three
line 4
line five

To do this I created rule_file like below

this is|
number|
"now" it is|line
note* this|line

And following is the script:
Code

rule_file=$1
i=0
j=1
cp ./infile ./ofile_${i}
while read line; do
curr_text="`echo ${line} | awk -F\"|\" '{print $1}'`"
new_text="`echo ${line} | awk -F\"|\" '{print $2}'`"
sed "s/${curr_text}/${new_text}/g" ./ofile_${i} > ./ofile_${j}
i=${j}
j=`expr $j + 1`
done < ${rule_file}

This script is giving me following output:

line 1
line 2
"now" it is three
line 4
note* this five

It is not able to handle " (double quotes) and * (asterisk). Can anybody please help me? Thanks!

ctsgnb · January 7, 2012, 5:16am

Can't you just remove all the " and * from the beginnning
adding the s/["*]//g substitution to the previous sed statement ?

sed 's/[([][^])]*[])]//g;s/["*]//g' yourfile