awk pattern search with inconsistent position

Hi,

Anybody knows how to get the value after the regexp and test it on pattern? The if the pattern matches, it will print the entire line on a separate file.

Here's my raw file:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121 
^_Name^_string^_Banana	^_Code^_string^_123    ^_Color^_string^_Yellow  
^_Name^_string^_Citrus	^_Color^_string^_Green	^_Code^_string^_129    ^_Color^_string^_Green 

I want to check if the Code string's last digit is within range of [1-5]. The Code^_ is not always on the same column so I can't just use the awk $3.

eg. I specified to search for [1-5], the output will be below:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121  
^_Name^_string^_Banana	^_Code^_string^_123    ^_Color^_string^_Yellow  

eg. I specified to search for [19], the output will be below:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121 
^_Name^_string^_Citrus	^_Color^_string^_Green	^_Code^_string^_129    ^_Color^_string^_Green 

What I've done is to get the code values,store in a file then grep [range]. But looping takes huge time specially with large files.

By the way, the "^_" is a control character, and the spaces are tabs.

---------- Post updated at 03:58 PM ---------- Previous update was at 03:52 PM ----------

I can also use grep without looping, but I cant use the * in between so no. Here's what's in my mind:

grep Code^_string^_*[1-5]$ [filename]

Something like this?

grep "_Code^_string^_[0-9]\{2\}[1-5]" infile

grep "_Code^_string^_[0-9]\{2\}[19]" infile

How many digits will you have (the above is for 3 digits)? And I suppose you want to check only the last digit?

--ahamed

1 Like

If you modify that grep to:

grep Code^_string^_[0-9]+[1-5]

it should work for when only numerals appear after the Code/string construct. I have not tested this.

Andrew

1 Like

Digits are inconsistent as well. Can be 3, can be 10. Is the solution above going to work?

What does the "+" sign do? Is it like the "*" in ls ?

+ one or more hit
* 0 or more hit
? 0 or 1 hit

Problem is when the line also has irrelevant numbers at the end of the line, it will give a hit to the grep.

You have tested this? As I said, I had not. Taking another look I can see that it could match something like:

Code^_string^_12127

To stop this you would need to add something to the end of the pattern to match white space, but this would not match the end of the line. You could add a match for white space to the end of the pattern but then it won't match the end of the line. You could have two versions of my pattern: one with white space a the end and the other with the end of line anchor ($) at the end.

Andrew

Try:

grep -E '�Code�string�[0-9]*[1-5]([[:blank:]]|$)' file

where stands for that control character ^_

or perhaps just

grep -E '.Code.string.[0-9]*[1-5]([[:blank:]]|$)' file

with the dot as a catchall for whatever single control character is used

--
On Solaris use /usr/xpg4/bin/grep -E

1 Like