How to get lines with only one occurence of pattern?

migurus · December 21, 2016, 7:12pm

My data

20161220 20:30:01 MODE 1 TEST 1 SOURCE 1 SET 1
20161220 20:30:02 MODE 1 TEST 2 SOURCE 1 SET 1
20161220 20:30:02 MODE 1 TEST 3 SOURCE 1 SET 1
20161220 20:30:02 MODE 1 TEST 1 SOURCE 2 SET 1
20161220 20:30:04 MODE 1 TEST 1 SOURCE 1 SET 1 MODE 1 TEST 2 SOURCE 2 SET 1
20161220 20:30:02 MODE 2 TEST 2 SOURCE 1 SET 1
20161220 20:30:06 MODE 1 TEST 1 SOURCE 1 SET 1 MODE 2 TEST 2 SOURCE 2 SET 1

To get lines with more than one MODE I do this:

$ sed -n '/MODE.*MODE/p' aa
20161220 20:30:04 MODE 1 TEST 1 SOURCE 1 SET 1 MODE 1 TEST 2 SOURCE 2 SET 1
20161220 20:30:06 MODE 1 TEST 1 SOURCE 1 SET 1 MODE 2 TEST 2 SOURCE 2 SET 1

Now I need to get only lines with single occurrence of MODE. Before I start writing an awk counting these MODE keywords, is there a "proper" way with just a regular expression?

Aia · December 21, 2016, 8:43pm

If all lines contain MODE as shown:

$ sed -n '/MODE.*MODE/!p' aa

Chubler_XL · December 21, 2016, 8:53pm

For must contain MODE but not MODE.*MODE

sed -n '/MODE/!d;/MODE.*MODE/!p' aa

or you could use 2 greps:

grep "MODE" aa | grep -v "MODE.*MODE"

Aia · December 21, 2016, 10:32pm

Some alternatives with Perl.

Display only lines with one MODE

perl -ne 'print if (()=/MODE/g) == 1' migurus.file

Display only lines with two MODE

perl -ne 'print if (()=/MODE/g) == 2' migurus.file

Display any line that do not have only two MODE

perl -ne 'print if (()=/MODE/g) != 2' migurus.file

Display any line with zero or one MODE

perl -ne 'print if (()=/MODE/g) < 2' migurus.file

looney · December 22, 2016, 2:39am

Hi Aia , could you please explain below highlighted part of code.

perl -ne 'print if (()=/MODE/g) == 1' migurus.file

Scrutinizer · December 22, 2016, 2:49am

You cannot do this with a single basic regular expression. You need to either count the number like some solutions do or use multiple regexes.

A variation on Chubler XL's approach with two regexes:

sed '/MODE.*MODE/d; /MODE/!d' file

Or counting the number of occurrences:

awk -FMODE 'NF==2' file

Aia · December 22, 2016, 2:47pm

The parenthesis is a list context of the matches from the regex which the evaluation uses back in scalar context.

Written in another way:

 perl -ne '@match_list = $_ =~ m/MODE/g; print $_ if (scalar @match_list) == 1' migurus.file

migurus · December 27, 2016, 5:23pm

Thanks everybody, I found the awk-based solution checking on NF==2 as the most appropriate. This is because I oversimplified the sample I posted, actual data has records with three and more sets and I don't have perl available in this environment.