Egrep a number greater than in a column

SkySmart · May 6, 2013, 12:20pm

i'm aware awk can do what i'm trying to do here. but i cant use awk in this scenario given the circumstance of this box.

but i need to check if a number is a certain column is over a certain value, say for instance, 20.

data:

|        12 |         19 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

my code is below (im sure this is wrong):

egrep Mailbox data | egrep "$2" | egrep "[2][0]+"

Don_Cragun · May 6, 2013, 12:41pm

Sorry, but grep is not an option for this. The grep family of utilities doesn't do arithmetic (including numeric comparisons), it matches strings.

------------------------

I take it back. There is a way to do it, it is just messy (and you will need to re-engineer a different expression for each value you're trying to compare). What do you mean by field 2? In you example is field 2 supposed to be 12 or 19? Will you ever have negative numbers? For non-zero values, will there ever be a leading 0?

With grep you don't have $2 to specify the second field and once you have the second field you don't have >.

balajesuri · May 6, 2013, 12:41pm

grep does not have a way to tokenise the input like awk/perl (it's not built to do so). You can cook a pattern to match all the way until the required field, but then.. you have awk/perl.

To check if number in field #2 > 10 :

grep Mailbox datafile | egrep "^\| *[0-9]+ *\| *1[0-9]+"

Yoda · May 6, 2013, 12:46pm

Just a suggestion: if awk is not an option, how about using bash script?

#!/bin/bash
while IFS="|" read f1 f2 skip
do
        if [[ "$skip" =~ "Mailbox" ]] && [ $f2 -gt 10 ]
        then
                echo "$f1 $f2 $skip"
        fi
done < data

SkySmart · May 6, 2013, 2:10pm

balajesuri:

grep does not have a way to tokenise the input like awk/perl (it's not built to do so). You can cook a pattern to match all the way until the required field, but then.. you have awk/perl.

To check if number in field #2 > 10 :
grep Mailbox datafile | egrep "^\| *[0-9]+ *\| *1[0-9]+"

looks like this could be the answer. although it would be dirty.

in this case, the bolded would be considered the second column/field:

| 12 | 19 | 2000 | 9029333 | 2013-05-01_04:15:55 | 291.00h | 0 | T | 0 | Mailbox.1113 |

---------- Post updated at 02:04 PM ---------- Previous update was at 01:46 PM ----------

sorry guys, i'm having issues with this:

i want to show whichever line has a value of 5 or more. i'm trying to use this:

egrep "^\| *[0-9]+ *\| *[5-9]+" data

but it doesn't seem to be working.

---------- Post updated at 02:10 PM ---------- Previous update was at 02:04 PM ----------

In this example, the 2nd field would be 19. there's not gonna be a negative number.

Don_Cragun · May 6, 2013, 3:28pm

To match $3 >= 5, followed by "Mailbox" when your field terminator is '|' try:

egrep '^[|][^|]*[|] *0*([5-9]|[1-9][0-9]).*Mailbox' data

After finding the 2nd "|" on the line it ignores any number of spaces, any number of leading zeros, and then it looks for a single digit that is 5 through 9 or two or more digits starting with 1 through 9.

To match $3 > 20, followed by "Mailbox" try:

egrep '^[|][^|]*[|] *0*(2[1-9]|[3-9][0-9]|[1-9][0-9]{2}).*Mailbox' data

After finding the 2nd "|" on the line it ignores any number of spaces, any number of leading zeros, and then looks for a two digit string that is 21-29, a two digit string that is 30-99, or a 3 or more digit string that is greater than or equal to 100.

As you can see you'll need to craft the ERE to use to select the numeric values you're trying to match. Hope these examples help.

hanson44 · May 6, 2013, 5:25pm

Here a way to do it:

$ cat infile
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        5 |         20 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        4 |         19 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

$ cat test.sh
echo Lines with first field GE 5:
grep -e "^ *| *0*[5-9]" \
     -e "^ *| *[1-9][0-9]" infile

echo
echo Lines with second field GT 20:
grep -e "^ *| [^|]* | *2[1-9]" \
     -e "^ *| [^|]* | *[3-9][0-9]" infile

$ ./test.sh
Lines with first field GE 5:
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        5 |         20 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

Lines with second field GT 20:
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

MadeInGermany · May 6, 2013, 5:31pm

I don't understand the requirement here; with awk it's just so simple:

awk -F"|" '$3+0 >= 5 && /Mailbox/' data
awk -F"|" '$3+0 > 20 && /Mailbox/' data

Is the requirement an exit status? Then do this

awk -F"|" '$3+0 > 20 && /Mailbox/' data | grep .

or that

awk -F"|" '$3+0 > 20 && /Mailbox/ {print; found=1} END {exit 1-found}' data

Don_Cragun · May 6, 2013, 6:41pm

hanson44:

Here a way to do it:

$ cat infile
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        5 |         20 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        4 |         19 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

$ cat test.sh
echo Lines with first field GE 5:
grep -e "^ *| *0*[5-9]" \
   -e "^ *| *[1-9][0-9]" infile

echo
echo Lines with second field GT 20:
grep -e "^ *| [^|]* | *2[1-9]" \
   -e "^ *| [^|]* | *[3-9][0-9]" infile

$ ./test.sh
Lines with first field GE 5:
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |
|        5 |         20 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

Lines with second field GT 20:
|        6 |         21 |         2000 |     9029333 | 2013-05-01_04:15:55 |   291.00h |        0 |    T |        0 | Mailbox.1113      |

Note that in an ERE, | is a special character and does not match a pipe symbol. Even if that wasn't a problem, note that the 2nd grep in this script would not match 100, 026, or 040 even though all of these are >21. The OP never answered my question about whether there might be leading 0s; so I don't know if the last two matter or not. If they do matter, the 1st grep also wouldn't match 010 even though it is >=5 (assuming that the ERE was fixed to match a literal '|' rather than use it to specify a choice of EREs separated by the special character '|').

hanson44 · May 6, 2013, 10:04pm

I agree it's problematic, zeroes or not zeroes, and I left out 101. Yes, awk is better for this, this is a setup for awk, but the OP said could not use awk for some reason. Anyway, here's a corrected syntax:

echo Lines with first field GE 5:
grep -e "^ *| *[5-9]" \
     -e "^ *| *[1-9][0-9]" infile

echo
echo Lines with second field GT 20:
grep -e "^ *| [^|]* | *2[1-9]" \
     -e "^ *| [^|]* | *[3-9][0-9]" infile
     -e "^ *| [^|]* | *1[0-9][0-9]" infile

Don_Cragun · May 6, 2013, 11:01pm

hanson44:

I agree it's problematic, zeroes or not zeroes, and I left out 101. Yes, awk is better for this, this is a setup for awk, but the OP said could not use awk for some reason. Anyway, here's a corrected syntax:
echo Lines with first field GE 5:
grep -e "^ *| *[5-9]" \
   -e "^ *| *[1-9][0-9]" infile

echo
echo Lines with second field GT 20:
grep -e "^ *| [^|]* | *2[1-9]" \
   -e "^ *| [^|]* | *[3-9][0-9]" infile
   -e "^ *| [^|]* | *1[0-9][0-9]" infile

Hi hanson44,
I know the OP said no awk. That is why I provided the two egrep scripts in message #5 in this thread that do what I think you're trying to do with these with the additional constraint posed by the OP that only lines containing "Mailbox" are to be printed. Am I correct in assuming that you meant to have a \ rather than infile in the spot I marked in red above?

I see that your scripts will allow leading spaces before the 1st "|" on the line that my scripts don't allow. None of the OP's samples had any lines that had leading spaces, so I don't know if this is important. Is there anything else your scripts do that the egrep scripts I provided earlier don't do correctly?

Note also that your corrected script still won't accept strings starting with three digits in the range 200-209 inclusive as being greater than 20.

hanson44 · May 7, 2013, 12:13am

Yes, the final pattern must start with [1-9] . :o I guess this just reinforces the point that awk is so much better for this kind of thing.

echo Lines with first field GE 5:
grep -e "^ *| *[5-9]" \
     -e "^ *| *[1-9][0-9]" infile

echo
echo Lines with second field GT 20:
grep -e "^ *| [^|]* | *2[1-9]" \
     -e "^ *| [^|]* | *[3-9][0-9]" infile
     -e "^ *| [^|]* | *[1-9][0-9][0-9]" infile