Grepping only if condition matches

Dear Friends,

I have a flat file which is as follows

$cat sample
123,456,1,1,1,1
sdfas,345,1,1,1,1
dfgd,234,2,3,4,1
ggffgr,234,4,3,2,1
jkhu,354.1,1,1,1
$

I want to get output of only those lines which has '1' in 3 to 5 position.

So I want output as follows

123,456,1,1,1,1
sdfas,345,1,1,1,1
jkhu,354.1,1,1,1

Kindly guide.
Anu.

What have you tried so far?

I didn't try anything as I know way out only by using "If" statement which unfortunately I do not want to use.
Hence seeking for guidance who know grep well.

Please search forum before posting, kind of question you asked now, repeated several times on fora with different data,

you may try awk, its easy :slight_smile:

[akshay@localhost tmp]$ cat sample
123,456,1,1,1,1
sdfas,345,1,1,1,1
dfgd,234,2,3,4,1
ggffgr,234,4,3,2,1
jkhu,354.1,1,1,1

[akshay@localhost tmp]$ awk -F, '$3 == 1 &&  $4 == 1 && $5 == 1' sample
123,456,1,1,1,1
sdfas,345,1,1,1,1
jkhu,354.1,1,1,1

[akshay@localhost tmp]$ awk  -F, '{j=1;for(i=3; i<=5; i++)j*=$i==1}j' sample
123,456,1,1,1,1
sdfas,345,1,1,1,1
jkhu,354.1,1,1,1

So if your file has several fields that you can create an expression for, then that should do it.

If the separator is , then an ignored field is .*, meaning zero or more ( * ) of any character ( . ) followed by the field separator ( , )

So, to count from the beginnig of the line your expression starts as ^.*,.*, to signify start of record ( ^ ) the ignore two fields. You can then tag on 1,1,1, to specify your requirements and the rest doesn't matter if it matches or not.

I think you can end up with:-

egrep "^.*,.*,1,1,1," input_file

From your sample input, I get one less line because the one starting jkhu does not have the correct field separator between fields 2 & 3.

I hope that this helps,
Robin

awk -F, '$3==$4==$5==1' sample

Thank you friends for the help which was much needed. Special thanks to Mr. rbatte1 for taking extra efforts for step by step guiding.
Thank you.

This might work on some systems, but it certainly is not portable.

The standards state that there is no associativity for the == operator and some versions of awk produce the syntax error:

awk: syntax error at source line 1
 context is
	 >>> $3==$4== <<< 
awk: bailing out at source line 1

If we rewrite the expression as:

awk -F, '$3==($4==($5==1))'

then there are lots of cases where that expression will evaluate to 1 even if all three of those fields are not set to 1. For example, the above command will print any of the following lines:

a,b,1,1,1
a,b,1,0,X for any X other than 1
a,b,0,1,X for any X other than 1
a,b,0,W,X for any W other than 0 or 1 for any X

Of course, it could also be rewritten as:

awk -F, '(($3==$4)==$5)==1))'

which would print any of the following lines:

a,b,1,1,1
a,b,X,X,1 for any X
a,b,X,Y,0 for any X that is not Y

Note that grep will work as well as egrep (or the preferred syntax grep -E ) for the RE being used in this thread.

Note also that the RE suggested works correctly only if there are exactly 6 fields (separated by 5 commas) on each input line. Since BREs and EREs use a greedy match, the RE .*, can match more than one field if there are more than 5 commas on a line. For example, that egrep command will also print the lines:

a,b,c,1,1,1,2
a,b,0,0,1,1,1,2
1,2,1,2,1,2,1,2,1,2,1,1,1,2

in addition to lines with 1 in fields 3,4, and 5 that only have 6 fields.

To make it work correctly on a line containing six commas (i.e. 7 fields), you would need to change the RE to:

.*,.*,1,1,1,.*,

and you would need to add an additional .*, to the end of that RE for each additional field in your input file.

Alternatively, we could use an RE that only matches non-comma characters in each of the first two fields:

grep '^[^,]*,[^,]*,1,1,1,' input_file

which will only print lines with 1 in fields 3, 4, and 5 as long as there are at least six fields on each line. ( [^,]* is an RE that matches zero or more occurrences ( * ) of any character that is not a comma ( [^,] ) followed by a comma ( , ). And, the leading ^ in the entire RE anchors the match to the start of the line.)