Search string between the strings

balajikalai · October 12, 2010, 6:41pm

File name : Sample.txt

<ownername>Oracle< ownername>

I am new to unix world , i would like to search string and return back to another sh script.

bascially i want to read file Sample.txt find the string between <ownername> Sample.txt < ownername> .
Gerneric way to find the string between <>x<>.Could you please help me to get the code for the same.

Output should be : Oracle

Best Regards,
Baaaalaaaa

DGPickett · October 12, 2010, 8:08pm

grep '<ownername>[^<]*Sample.txt[^<]*</ownername>'

Narrative: grep for the tag and then some characters not < and then the string and then more not < and then the closing tag.

grep can find the lines with string in tags, but sed can dig out the string. This assumes one per line:

sed '
  s/.*<ownername>\([^<]*\)<\/ownername>.*/\1/
  t
  d
 '

Narrative: select in every line with a regex that captures the whole line but a substring captures the string between tags, and replace the whole line with that substring. If the replacement occurs, branch to end of script (print and go to next line) else delete line.

I like sed because the bits and pieces are reusable in sed, vi, ex, ed, grep, egrep, ksh command line editing, C, JAVA, PERL, awk.

balajikalai · October 13, 2010, 12:17am

Hi
Thank you so much for your help.Actually i have missed out few items.

Input File name : Sample.txt

line 1
<schema>database<schema>
line 2
line 3
line 4
<schema>Oracle<schema>
line 5

Actually i would like to read <schema>database<schema>string from input file and return only once database as my output.

output : database

Sorry to bother you , Please guide me for the same.
Thanks in advance

Baaalaaa

---------- Post updated at 11:17 PM ---------- Previous update was at 11:01 PM ----------

Just executed this script in unix command prompt grep '<ownername>[^<]*test.txt[^<]*</ownername>'

i didnt get any output but i press ctr c to come out;

jlliagre · October 13, 2010, 1:51am

Please clarify what the input format is. In your first posting, there is a space before the ending tag label, in your last one, there is no more space. Everyone is expecting a / there but your files might not be xml. In that case, that might be:

'<ownername>[^<]*test.txt[^<]*<ownername>'

balajikalai · October 13, 2010, 2:03am

Input file : test.txt

<owner_name>balaji<owner_name>

I have tired the same command in UNIX box , but i am not getting any output , control is in same place

w : 32 :/sh
%
grep '<owner_name>[^<]*test.txt[^<]*<owner_name>'

jlliagre · October 13, 2010, 2:06am

grep '<owner_name>[^<]*<owner_name>'

DGPickett · October 13, 2010, 1:42pm

Well, not having given it a file or piped in a stream, it was grep'ing stdin = your keyboard.

The sed command q stops it, printing the current buffer if not -n.

sed '
  s/.*<schema>\([^<]*\)<schema>.*/\1/
  t quit
  d
  :quit 
  q
 ' Sample.txt

or use sed -n, not usually my choice as it ends up being longer:

sed -n '
  s/.*<schema>\([^<]*\)<schema>.*/\1/
  t pquit
  b
  :pquit 
  p
  q
 ' Sample.txt

Franklin52 · October 13, 2010, 2:00pm

Or:

awk -F"<|>" '/<schema>/{print $3}'

Scrutinizer · October 13, 2010, 2:28pm

sed -n 's|.*<\(ownername\)>\(.*\)</*\1>.*|\2|p'

balajikalai · October 13, 2010, 2:55pm

thank you so much for your help

my input file Sample.txt format has changed little bit </owner_name>

line 1
<owner_name>balaji</owner_name>
line 2
line 3
line 4

please help me for the same.

DGPickett · October 13, 2010, 3:16pm

Since slash is a meta-char, you need \/ or [/].

---------- Post updated at 03:16 PM ---------- Previous update was at 03:15 PM ----------

sed '
  s/.*<schema>\([^<]*\)<\/schema>.*/\1/
  t quit
  d
  :quit 
  q
 ' Sample.txt

Scrutinizer · October 13, 2010, 3:20pm

sed -n 's|.*<\(owner_name\)>\(.*\)</*\1>.*|\2|p' infile

---------- Post updated at 21:20 ---------- Previous update was at 21:17 ----------

Hi DGPickett, actually I don't think I do, since I am using | as the separator, so it isn't a metacharacter in this case.

DGPickett · October 13, 2010, 3:24pm

True, I stand corrected -- must have had my head tilted.

You just do not exit after the first hit is printed.

I have not used many options like post-s p only good with -n because they are utterly redundant to other, more modular, less limited, more general pieces, and space in my head is more precious than disk space. Commands are designed by committee, it seems.

Scrutinizer · October 13, 2010, 4:02pm

Your right, it is not that efficient, but the idea is that there may be more then one pair of tags (each on the same line otherwise it will not work), to quit I'll gladly use your solution

sed 's|.*<\(owner_name\)>\(.*\)</*\1>.*|\2|;te;d;:e;q' infile

---------- Post updated at 22:02 ---------- Previous update was at 21:36 ----------

This should work with content spread over more than one line

awk '$2=="/owner_name"{gsub(/\n/," ",$1);print $1}' FS=\< RS=\> infile

balajikalai · October 13, 2010, 4:11pm

file name: text.txt

line 1
<owner_name>shell</owner_name>
line 2
line 3
line 4
<owner_name>shell</owner_name>

it worked but i am getting 2 times output, but i need only once.

sed -n 's|.*<\(owner_name\)>\(.*\)</*\1>.*|\2|p' text.txt

output
shell
shell

thank you so much for all your help

Scrutinizer · October 13, 2010, 4:23pm

Hi that is by design, see the alternative sed solution in my previous post and solutions by other posters..