How to Extract text between two strings?

emresearch · July 6, 2011, 2:59am

Hi,

I want to extract some text between two strings in a line i am using following command i.e;

awk '/-string1/,/-string2/' filename

contents of file is---

line1
line2
aaa -bbb -ccc -string1 c,d,e -string2
line4

but it is showing complete line which is having searched strings.

aaa -bbb -ccc -string1 c,d,e -string2

I want only

c,d,e

so can any one help me to do this task?

Thanks
Manish

pravin27 · July 6, 2011, 3:10am

Perl

perl -nle 'print $1 if /-string1(.+?)-string2/' inputfile

michaelrozar17 · July 6, 2011, 3:12am

You would need to further tell awk which strings needs to be printed. By default it prints the whole line read.

awk '/string/{print $5}' inputfile

emresearch · July 6, 2011, 3:55am

Thanks for quick response....it is working fine...But i want to do this thing in shell scripting only..

So Please tell me any command in shell scripting

Thanks
Manish

---------- Post updated at 01:25 PM ---------- Previous update was at 01:19 PM ----------

@michaelrozar17

Thanks for your reply

But if i do not know about the field($5) then how can i exract strings after strings1.

One more thing,
after the find out string1... i have to extract the string till the next space after string1.

How can i do this task?

Thanks
Manish

michaelrozar17 · July 6, 2011, 4:28am

Something like this...?

sed -n '/string/s/.*string. \([^ ]*\) .*/\1/p' inputfile

emresearch · July 6, 2011, 4:41am

@michaelrozar17
Thanks for you reply

But it is not working, it is not giving any output.
So Can you please explain me this command?

Thanks

Franklin52 · July 6, 2011, 4:48am

sed -n 's/.*string1 \([^ ]*\) .*/\1/p' file

emresearch · July 6, 2011, 4:57am

@Franklin52

Thanks for you reply

But still it is not working, it is not giving any output.

please explain me this command also, how it will work?

and I want to extract between two strings(string1 and string2 )

Thanks

Franklin52 · July 6, 2011, 5:03am

Works fine for me:

$ cat file
line1
line2
aaa -bbb -ccc -string1 c,d,e -string2
line4
$ sed -n 's/.*string1 \([^ ]*\) .*/\1/p' file
c,d,e

Post a better sample of your input file and please use code tags.

ltomuno · July 6, 2011, 5:53am

sed -n '/\-string1.*\-string2$/p' inputfile | awk '{print $(NF-1)}'

emresearch · July 6, 2011, 6:01am

My file content is----

#interpreter
cd directory
spotlight -verilog file.v -$regression -batch -policy=xyz -rules ABC,abc,pqr -wdir ../other_files
cd ...

In this case i want to extract strings after -rules till the next space.-ABC,abc,pqr

I am using first awk command to find the line--

awk '/-rules/'{print} > output

because awk print complete line ----

so now output file will have only line which is having -rule, now i have to extract string after -rule till the next space.

As i am new to shell scripting so please explain your answer also.

Thanks
Manish

michaelrozar17 · July 6, 2011, 6:15am

I wonder why you didn't try the sed solution posted by Franklin52 or me! Can you make necessary changes to sed and try again.

sed -n '/rules/s/.*rules \([^ ]*\) .*/\1/p' inputfile

emresearch · July 6, 2011, 6:45am

@Franklin52

@michaelrozar17

I am using the same command again and again but its is not working.

Please go through my input files once then reply me for the same.

or if you have any alternative solution for this task.

My input file is ----

Script---

#interpreter
cd directory
spotlight -verilog file.v -$regression -batch -policy=xyz -rules ABC,abc,pqr -wdir ../other_files
cd ...

First i am using awk command----

$ awk '/-rules/{print}' script > output

after this i am using the sed command provided by you like---

$ sed -n '/rules/s/.*rules \([^ ]*\) .*/\1/p' output > out1

now out1 file is empty.
it means sed command is not working properly in this case.

Thanks
Manish

michaelrozar17 · July 6, 2011, 7:05am

emresearch:

My input file is ----
Script---

#interpreter
cd directory
spotlight -verilog file.v -$regression -batch -policy=xyz -rules ABC,abc,pqr -wdir ../other_files
cd ...
after this i am using the sed command provided by you like---
$ sed -n '/rules/s/.*rules $[^ ]*$ .*/\1/p' output > out1
now out1 file is empty.
it means sed command is not working properly in this case.

Ok.Can you post the content of the file output as highlighted above. If its same as the highlighted in bule please post few more lines of the inputfile to sed command. And post the sed version you have.Command: sed --version (in GNU Linux)

gprashant · July 6, 2011, 7:19am

Hi,

Try this:

grep "string1" a.txt | cut -d' ' -f5

emresearch · July 6, 2011, 7:59am

@michaelrozar17

Sed version is 4.1.2 of GNU

after using awk command---output file content is-----

spotlight -verilog file.v -$regression -batch -policy=xyz -rules ABC,abc,pqr -wd
ir ../other_files

now i want ABC,abc,pqr only.

it is sure that it will come after -rules but -rules position is not fix in the line so the field can vary for different file so i have to search -rule then extract the next strings till next space.

So Reply me with the suitable solution

Thanks
Manish

---------- Post updated at 05:29 PM ---------- Previous update was at 05:25 PM ----------

@gprashant

Thanks for reply.
i know about this solution.
But problem is string field position is not fix for different file.
so i can not use field operator $5. it may be five in some cases it may be anything.

so i have to search string and then print the next string till the space.

Thanks
MAnish

shamrock · July 6, 2011, 9:34am

If the searchd sttring is always "rules" then try this awk liner...

awk -F- '{for(i=1;i<=NF;i++) if($i~"^rules"){gsub("rules| ","",$i);print $i}}' file

benitto · July 6, 2011, 9:45am

Hi,
Try this:

sed -n '/WORD1/,/WORD2/p' /path/to/file

michaelrozar17 · July 6, 2011, 10:09am

Ok. Guess your inputfile has both rules and -rules words..The below given input file is as posted by you.

$cat infile
Script---

#interpreter
cd directory
spotlight -verilog file.v -$regression -batch -policy=xyz -rules ABC,abc,pqr -wdir ../other_files
cd ...

$sed -n '/-rules/s/.*rules \([^ ]*\) .*/\1/p' infile
ABC,abc,pqr
$sed -n '/-rules/s/.*rules \(.*\) -.*/\1/p' infile
ABC,abc,pqr

gprashant · July 7, 2011, 4:39am

Hi Manish,

how about this:

awk -F"-string1" '{print $2}' a.txt | cut -d" " -f 2 | tail -2 | head -1