Pull Intermediate Strings

OMLEELA · April 13, 2011, 9:58pm

Experts,

You all have been very supportive of me so far & Im thankful for it.

I need to extract data between two sets of parenthesis and also between quotes.

cat LOGFILE | grep 'number wasnt' | head -2

I. 2011/04/14 01:12:03. process(130) Deleting Text on line 11 (ESN:27723211621B01DJ68AG) because a number wasnt 'AVAILABLE'  and is not found in the database
I. 2011/04/14 01:12:03. process(130) Deleting Text on line 12 (ESN:27723211634ATADJ68AK) because a number wasnt 'AVAILABLE'  and is not found in the database

what I need is "27723211621B01DJ68AG" & "AVAILABLE".

So here is what I do -

cat LOGFILE | grep -i 'number wasnt' | cut -d'(' -f3 | sed -e 's/[a-z].//g' | sed -e 's/SN://g' -e 's/)//g' | tr -d "'" | head -2

E27723211621B01DJ68AG  AVAILABLE    
E27723211634ATADJ68AK  AVAILABLE

The solution that Im using right now works and this has to do with eliminating all of unnecessary characters instead of extracting what I need(which definitely is not elegant at all).
But owing to my limited understand of regex, I coded this way.

However there is new change and we need to pull in even the "130" which is in the first set of quotes at the beginning and Im not sure as how to go about this.

Simply stated, here is what I have -

I. 2011/04/14 01:12:03. process(130) Deleting Text on line 11 (ESN:27723211621B01DJ68AG) because a number wasnt 'AVAILABLE'  and is not found in the database
I. 2011/04/14 01:12:03. process(130) Deleting Text on line 12 (ESN:27723211634ATADJ68AK) because a number wasnt 'AVAILABLE'  and is not found in the database

and I need

130  27723211621B01DJ68AG AVAILABLE.

How do I get this.

please help,

regards,
Lee.

aster007 · April 14, 2011, 1:00am

Had to divide it into 2 parts since the separators were different

cat LOGFILE | grep "number wasnt" | head -2 | while read LINE
do
    BRACKET_TXT=`echo $LINE | awk '$0=$2' FS=\( RS=\) | tr -s "\n" "  "`
    QUOTE_TXT=`echo $LINE |  awk -F"'" '{print $2}'`
    
    echo "$BRACKET_TXT $QUOTE_TXT"
done

michaelrozar17 · April 14, 2011, 1:56am

Through sed..

grep 'number wasnt' logfile.txt |head -2| sed "s/^.*(\([^)]*\)).*(....\(.*\)).*'\(.*\)'.*/\1 \2 \3/"

OMLEELA · April 14, 2011, 2:07am

michaelrozar17 & aster007,

~ wow ~

I cant believe you guys whipped it out in no time ! makes me feel so "small" & "trivial" ...

anyway, here is a simple question to Michaelrozar17.

Can you please please explain your sed ... I cant seem to understand it....

Aster007 your awk is so simple yet is so easy to read & assimiliate.

I stand up and say "thank you" to both of you.

regards,
Lee

cgkmal · April 14, 2011, 2:50am

Hi OMLEELA,

Another with awk:

awk '{print gensub(/.*\((.*)\).*\(.*:(.*)\).*'\''(.*)'\''.*/,"\\1 \\2 \\3","g")}' inputfile
130 27723211621B01DJ68AG AVAILABLE
130 27723211634ATADJ68AK AVAILABLE

Regards

OMLEELA · April 14, 2011, 3:01am

cgkmal,

Thank you.

Yours works wonderful as well.

Btw, as you took the regular exp approach, can you please explain as to what you are doing here as your explanation would certainly give me a better sense of how to approach this problem, the next time onwards.

regards,
Lee.

cgkmal · April 14, 2011, 4:22am

Sure OMLEELA,

I'll try to explain good enough

I'm using gensub function with regexp back reference feature, This feature within gensub works something like this:

gensub(/regexp/, replacement, how [, target] , where

regexp: Pattern you want to search
Replacement: The replacement of "pattern"
how="g": Indicates it replaces all matches of regexp with replacement
target: If no target is supplied, $0 is used

To use back reference you need to suround between parentheses "(" and ")" the regexp you want to remember and
all that is outside the back reference parentheses won't be buffered or remembered.

# back reference parentheses in red, regexp to be matched is inside them 
gensub(/.*\((.*)\).*\(.*:(.*)\).*'\''(.*)'\''.*/ ....
              1            2           3

Explaining first part of regexp:

.*\((.*)\)=.* plus \( plus (.*) plus \)

Where,

.* is to match: I. 2011/04/14 01:12:03. process
\( is to match: (  the opening parentheses  <<the literal "(" is escaped with \( >>
(.*) is to match: 130 = the content within the literal parentheses and to use back reference we suround with ()
\) is to match: )  the closing parentheses <<the literal ")" is escaped with \) >>

and so on.

Then the 2nd back reference matches and remembers the string after ":" and before ")" in green(ESN:27723211621B01DJ68AG))

And the 3rd back reference is to match the substring between single quotes. Here, to match literal single quotes was needed to escape it
not only with \', but with '\''. This is single
quotes ' .. ' around \'.

After follow the same process to match the following part of the complete string, to remember the stored substrings
we use \\i in the order needed. I've used 3 backreference parentheses, then I remenber 3 backreference in the order
1,2,3, but it could be 3,1,2 or 3,3,1, or 3,2,1 etc, up to your needs. In this case with a space between them
"\\1 \\2 \\3".

Hope this helps.

Regards

OMLEELA · April 14, 2011, 4:36am

cgkmal,

I cant believe it was such a verbose & poignant explanation.

Honestly cant thank you enough on this help.

You have a blessed day.

regards,
Lee.

kurumi · April 14, 2011, 4:38am

$ ruby -ne 'puts $_.scan(/[\(\047](.*?)[\)\047]/).flatten.join("\s")' file
130 ESN:27723211621B01DJ68AG AVAILABLE
130 ESN:27723211634ATADJ68AK AVAILABLE

OMLEELA · April 14, 2011, 4:45am

Kurumi,

I dont know the head & tail of Ruby but your one liner does a fine job.

It looks like every one of you had a different approach to this issue and yet I cant seem to wrap my head around most of these approaches nor the way you folks are slicing this.

Im humbled & thanks a lot.

regards,
Lee.