Unix Grep Conundrum - Not for Noobies

owenian · January 18, 2011, 2:35pm

Help,

I have been stuck on this issue for weeks.
I am a unix noobie.

I have a very long string and within that string I am trying to get proc file names
ie
PROCNAME1=SOME_FILENAME_UPDTBASE.SQL

There is a space on either side.

I can't for the life of me peel out the proc name: SOME_FILENAME_UPDTBASE.SQL

I have a file I am comparing it against to verify it is a valid script but I can't verify it as I can't peel it out with grep.

I have hacked with awk and sed but no luck.

Any help would be appreciated.

Thanks in advance

citaylor · January 18, 2011, 2:42pm

I think your question is a little ambiguous...do you simply want:

awk -F= '{ print $2 }'

joeyg · January 18, 2011, 2:43pm

Should be something like

cut -d" " -f6

to get the sixth field broken by a space

awk -F" " '{print $6}'

again to get the sixth field

owenian · January 18, 2011, 2:49pm

sorry guys,

I forgot to mention:

the problem:
PROCNAME1=VEH_INC_PART_UPDTBASE_M.SQL

variable name: PROCNAME1
proc name: VEH_INC_PART_UPDTBASE_M.SQL

I don't know either the varibale name or proc name I am searching for nor it's location in the file.

My job searches hundreds of scripts pulling out the variable names then compares them with a list of procs in a file.

I can get most of them as they have space between calling proc and name.sql

But when there is no space before the script name and the variable name can be anything I have nothing concrete to search for.

Thank you for you time

joeyg · January 18, 2011, 2:51pm

Please put a piece of your input file in a message. Make sure to wrap it with CODETAGS. That will make this theoretical discussion practical.

owenian · January 18, 2011, 4:31pm

Hi joyeyg,

Sorry, I have not tried to be theoretical.
There is no codetags.
I just want to search this file find the .SQL scripts and compare them to a file that i have.
There are two other sql script names in this file and i can find them easily but this one has no space before it, in reality when my job is running i would not know the variable name or the script name etc

This is a snipit of code from my input file.
All the following code is on one line.
...EXIT 16 FI JOBNAME=DA331Z_LD_CCC_BSE_TBLS_M PROCNAME1=VEH_INC_PART_UPDTBASE_M.SQL $ORACLE_SCRIPT_PROC/SQLSCRIPT_BATCH INC_PART2_UPDTBASE_M.SQL
SCRERROR=$? IF [ $SCRERROR -NE 0 ] THEN ECHO "ERROR: " INC_PART2_UPDTBASE_M.SQL" FAILED ...

Thanks

methyl · January 18, 2011, 6:54pm

Please post code and data in "Code Tags".

What this means is do a Windows highlight of the code or data in your post then click on the "Wrap [CODE] tags around selected item" toolbar button.
Hover the mouse over the various icons on the toolbar to find the right toolbar icon. On my screen the correct toolbar item appears to be greyed out ... but it works once you can find it. The faint icon looks like a finger pointing at the word "CODE" but at 1024/768 screen resolution you need a magnifying glass and a big torch to read it.

Once posted in "Code Tags" other posters can cut/paste your code or data without corruption or unwanted formatting (like loss of spacing).

When your post is about subtle processing issues with different spacing in the data, posting samples with and without the problem is important.

Precision is everything in computing. Exactly how long is "very long"?

Sorry to be blunt, but in SQL programming terms this is total gibberish. It looks like a mixture of unix script and bits of SQL with additional "..." strings.
We don't mind if you post some practice posts, but please post the entire unix script in "Code Tags", blanking anything confidential with X's.
If you can cut/paste the code from your post and it is identical, we can do the same.

... and while I am on my high horse, the correct spelling is "snippet". LOL

grepeverything · January 20, 2011, 9:58am

I'm not totally sure what you want to do in the end, so maybe I'm way off here. If so, sorry.

Given your snippet,

Assume that is in a file called "bla.txt".
This:

Will output:
PROCNAME1 VEH_INC_PART_UPDTBASE_M.SQL

If you had more scripts in it it would pick up each one (I think), unless the var/proc pair was broken up by a carriage return.

You can probably tune it quite a bit for your situation, whatever that is.
Brief explanation:
-i case insensitive match
-o just print the matched string, not the whole line like usual
-P use Perl regex's (for the lookbehind)
(<?<= ) lookbehind matches the space before your "proc" name but keeps it out of the -o print
[[:alnum:]_]+=[[:alnum:]_]+\.SQL match one or more alphanumeric or _ characters followed by "=" followed by one or more alphanumerics or _ followed by ".sql"

Then pipe it to awk to print the first and second fields delimited by "=".

You could do other things too like add a -r to the grep to search a whole directory structure recursively, or pipe everything to "sort -u" to eliminate duplicate results, even "sort | uniq -c | sort -n" to get numeric values for how often the same proc/var name pairs occur.

---------- Post updated Jan 20th, 2011 at 03:58 PM ---------- Previous update was Jan 19th, 2011 at 10:58 PM ----------

I just realized I should have posted that reply to owenian rather than methyl to be in the proper place in this thread....sorry...it's my first post @ unix.com.

owenian · January 20, 2011, 12:55pm

grepeeverything thank you for responding. I can't get it to work but I will continue to play with it.

I did learn from your response as you provided a break down of the code and what each bit does.

grepeverything · January 20, 2011, 2:44pm

If you describe what "can't get it to work" means I might be able to help. I could imagine maybe your version of grep is different, e.g. maybe doesn't support the -P option.

owenian · January 20, 2011, 4:22pm

correct it does not support -P or -o
ie

[ku2q@dwsdv1]/proj/dw/devl/script_job/controlm/map_tables/work
$ grep -ioP '(?<= )[[:alnum:]_]+=[[:alnum:]_]+\.SQL ' cccda331z_ld_ccc_bse_tbls_m.sh | awk -F= '{print $1,$2}'
grep: illegal option -- o
grep: illegal option -- P
Usage: grep -hblcnsviw pattern file . . .
[ku2q@dwsdv1]/proj/dw/devl/script_job/controlm/map_tables/work

Thanks for your time grepeverything.

ps I thought I had hacked out a fix but when i ran a bigger test volume it was not the case ; )

grepeverything · January 20, 2011, 5:17pm

hm, well you could just download (& build & install) gnu grep, which does have the -o and -P switches:http: / / directory.fsf.org / project / grep /
(remove the spaces)

Otherwise, this sort of works but is quite ugly:

Getting a fuller-featured version of grep would be much cleaner.

owenian · January 20, 2011, 5:53pm

can't download as the company has policies

not working.

egrep -i '[[:alnum:]_]+=[[:alnum]_]+\.SQL ' cccda331z_ld_ccc_bse_tbls_m.sh

does not pipe anything into awk

I will play around with it.
thanks

grepeverything · January 22, 2011, 4:00pm

You have a space missing at the beginning of the pattern and the ":" missing the named character class [:alnum:]

egrep -i '***SPACE***[[:alnum:]_]+=[[:alnum**COLON**]_]+\.SQL '
The space is somewhat optional but the colon not.