Help with extracting text from a string

kminkeller · February 25, 2010, 12:38pm

I dont know if I am making any sense here. But I need to do something like this.

I have a variable that contains result from the svnlook command on a post-commit hook script.

test=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$"

and I get

test=A  /content/qa/lesson1/index.html A  /content/qa/lesson2/index.html

basically those two files are just been added to the repository.

Now I need to write some unix script to extract dirctory names lesson1 and lesson2 from that text.

How can I do this? Any help or suggestion is highly appreciated.

Thanks

KM

joeyg · February 25, 2010, 12:59pm

Unclear with your sample -- is the output on just one line? Or, is it on multiple lines?

kminkeller · February 25, 2010, 1:05pm

thanks for you response jowyg.

sorry my question is pretty not clear. let me know if you have any question.

I would like to have the output in an array coz i need to do more things with those names.
like

dir[0]=lesson1
dir[1]=lesson2

Thanks.

joeyg · February 25, 2010, 1:57pm

Is your output currently

test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html

or

test=
A /content/qa/lesson1/index.html 
A /content/qa/lesson2/index.html

or something else

And, do you always want the 3rd field in your

/aaa/bbb/ccc/ddd/eee.html

lines?

kminkeller · February 25, 2010, 2:13pm

yes all the time i am looking into the third field. yes my output is usually like this
test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html

I am checking the svn repository if there has been new index.html file added. There could be as many index.html file but the folder name is different. currently in the example I have two files. based on that I need to create a redirect file that points to that location on a server. So I need that folder name to create that url.

joeyg · February 25, 2010, 2:28pm

>echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html | gawk '{print $2"\n"$4}' | gawk -F"/"'{print "dir["NR-1"]=",$4}'
dir[0]= lesson1
dir[1]= lesson2

or simply append this to your current command?

| gawk '{print $2"\n"$4}' | gawk -F"/"'{print "dir["NR-1"]=",$4}'

kminkeller · February 25, 2010, 2:36pm

Thanks joeyg

Sorry if I am not understanding something but i am getting invalid range error:

$ echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html | gawk '{print $2"\n"$4}' | gawk -F"/"'{print "dir["NR-1"]=",$4}'
gawk: fatal: Invalid range end: //{print "dir["NR-1"]=",$4}/

$ echo "A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html" | gawk '{print $2"\n"$4}' | gawk -F"/"'{print "dir["NR-1"]=",$4}'
gawk: fatal: Invalid range end: //{print "dir["NR-1"]=",$4}/

joeyg · February 25, 2010, 2:42pm

Are you using bash, or ksh, or ???

kminkeller · February 25, 2010, 2:48pm

i am using bash.

Thanks.

joeyg · February 25, 2010, 3:16pm

Does this

$ echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html | gawk '{print $2"\n"$4}'

give you

/content/qa/lesson1/index.html
/content/qa/lesson2/index.html

kminkeller · February 25, 2010, 3:22pm

thanks joeyg.

yes that works. but when i added that grep part on my hookscript, i am getting nothing. just the echo works fine as you said.

here is a part of my hook script.

changes=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$" | gwak 'print $2"\n"$4}'`
echo '>>>'$changes >> $ACTION_LOG

So what i am doing here is checking to see if new index.html file has been added to the repository. and then from there i am need to extract the folder name.

changes=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$"`

gives you the

A  /content/qa/lession1/index.html A /content/qa/lession2/index.html

so i added the script you gave me to it.

changes=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$"| gwak 'print $2"\n"$4}'

didnt give me anything back.

joeyg · February 25, 2010, 3:29pm

After you set the variable changes

>echo "$changes"

to see it on the screen
then, you might want

echo ">>>$changes" >> $ACTION_LOG

as sometimes spaces and other characters can confuse things

kminkeller · February 25, 2010, 4:14pm

yes it didnt work either. I get resonse printed on the log file for the previous command

changes=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$"`
echo '>>>'$changes >> $ACTION_LOG

as

>>>A /content/qa/lession1/index.html A /content/qa/lession2/index.html

thanks.

---------- Post updated at 05:14 PM ---------- Previous update was at 04:43 PM ----------

One thing I also realised that gawk '{print $2"\n"$4}' works only if you have two files. if it is more than 2 then index. If there are three files checked in to the repository it will still return only 2 files.

$  echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html A /content/qa/lesson3/index.html| gawk '{print $2"\n"$4}'
/content/qa/lesson1/index.html
/content/qa/lesson2/index.html

this is not what i am looking for.

joeyg · February 25, 2010, 4:19pm

Can you just enter the command?
The way you have it, you are writing output from the command to a string and then writing that to a file. Thus, why do you need to write the >>> ?
Do this

changes=`/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$"`
echo '>>>'$changes >> $ACTION_LOG
/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$" | gawk '{print $2"\n"$4}

which should write your stuff, but also execute the same command and send new output to the screen

/content/qa/lesson1/index.html
/content/qa/lesson2/index.html

kminkeller · February 25, 2010, 4:25pm

Sorry, yes I write the output to a text file $ACTION_LOG and i do
tail -f when I actually check in files through my IDE to see if anything is coming out.

Thanks.

joeyg · February 25, 2010, 4:28pm

Can you do the following:

>echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html A /content/qa/lesson3/index.html | sed 's/A/~A/g' | tr '~' '\n' | grep "^A" | cut -d" " -f2
/content/qa/lesson1/index.html
/content/qa/lesson2/index.html
/content/qa/lesson3/index.html

Essentially, do your command with a new filter at end

/usr/bin/svnlook changed $REPOS -r $REV | grep "^A.*index.html$" | sed 's/A/~A/g' | tr '~' '\n' | grep "^A" | cut -d" " -f2

---------- Post updated at 04:28 PM ---------- Previous update was at 04:25 PM ----------

append the following to your tail command

| sed 's/A/~A/g' | tr '~' '\n' | grep "^A" | cut -d" " -f2

to get one line for each instance

Does that much work?

kminkeller · February 25, 2010, 5:18pm

I used this and got it to work.

echo test=A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html A /content/qa/lesson3/index.html | \
sed 's/A/~A/g' | tr '~' '\n' | grep "^A" | cut -d" " -f2 | cut -d"/" -f4

I get

lession1
lession2
lessions

Now I need to plug this into my actual code and see if that works as i wanted. THanks for helping me so far. i will inform you if i got this going. thanks.

fubaya · February 25, 2010, 8:48pm

Here's another way but it's a couple milliseconds slower than sed | tr | grep | cut | cut

# echo "A /content/qa/lesson1/index.html A /content/qa/lesson2/index.html" | awk '{print $2"\n"$4}' | xargs -n1 dirname | xargs -n1 basename
lesson1
lesson2
#

joeyg · February 26, 2010, 7:27am

Unlcear whether appending the sed, etc..., commands gave you what you needed.