awk pattern search with inconsistent position

Jin1 · September 13, 2013, 3:58am

Hi,

Anybody knows how to get the value after the regexp and test it on pattern? The if the pattern matches, it will print the entire line on a separate file.

Here's my raw file:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121 
^_Name^_string^_Banana	^_Code^_string^_123    ^_Color^_string^_Yellow  
^_Name^_string^_Citrus	^_Color^_string^_Green	^_Code^_string^_129    ^_Color^_string^_Green

I want to check if the Code string's last digit is within range of [1-5]. The Code^_ is not always on the same column so I can't just use the awk $3.

eg. I specified to search for [1-5], the output will be below:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121  
^_Name^_string^_Banana	^_Code^_string^_123    ^_Color^_string^_Yellow

eg. I specified to search for [19], the output will be below:

^_Name^_string^_Apple   ^_Color^_string^_Red	^_Code^_string^_121 
^_Name^_string^_Citrus	^_Color^_string^_Green	^_Code^_string^_129    ^_Color^_string^_Green

What I've done is to get the code values,store in a file then grep [range]. But looping takes huge time specially with large files.

By the way, the "^_" is a control character, and the spaces are tabs.

---------- Post updated at 03:58 PM ---------- Previous update was at 03:52 PM ----------

I can also use grep without looping, but I cant use the * in between so no. Here's what's in my mind:

grep Code^_string^_*[1-5]$ [filename]

ahamed101 · September 13, 2013, 4:15am

Something like this?

grep "_Code^_string^_[0-9]\{2\}[1-5]" infile

grep "_Code^_string^_[0-9]\{2\}[19]" infile

How many digits will you have (the above is for 3 digits)? And I suppose you want to check only the last digit?

--ahamed

apmcd47 · September 13, 2013, 4:16am

If you modify that grep to:

grep Code^_string^_[0-9]+[1-5]

it should work for when only numerals appear after the Code/string construct. I have not tested this.

Andrew

Jin1 · September 13, 2013, 4:27am

ahamed101:

Something like this?
grep "_Code^_string^_[0-9]\{2\}[1-5]" infile

grep "_Code^_string^_[0-9]\{2\}[19]" infile
How many digits will you have (the above is for 3 digits)? And I suppose you want to check only the last digit?

--ahamed

Digits are inconsistent as well. Can be 3, can be 10. Is the solution above going to work?

What does the "+" sign do? Is it like the "*" in ls ?

Jotne · September 13, 2013, 7:02am

+ one or more hit
* 0 or more hit
? 0 or 1 hit

Jin1 · September 15, 2013, 7:46pm

Problem is when the line also has irrelevant numbers at the end of the line, it will give a hit to the grep.

apmcd47 · September 17, 2013, 3:43am

You have tested this? As I said, I had not. Taking another look I can see that it could match something like:

Code^_string^_12127

To stop this you would need to add something to the end of the pattern to match white space, but this would not match the end of the line. You could add a match for white space to the end of the pattern but then it won't match the end of the line. You could have two versions of my pattern: one with white space a the end and the other with the end of line anchor ($) at the end.

Andrew

Scrutinizer · September 17, 2013, 4:15am

Try:

grep -E '�Code�string�[0-9]*[1-5]([[:blank:]]|$)' file

where � stands for that control character ^_

or perhaps just

grep -E '.Code.string.[0-9]*[1-5]([[:blank:]]|$)' file

with the dot as a catchall for whatever single control character is used

--
On Solaris use /usr/xpg4/bin/grep -E