How to remove words that contain 3+ of the same character in a row?

Hello,

I am looking for a way to remove words from a list that contain 3 or more of the same character.

For example lets say the full list is as follows

ABCDEF
ABBHJK
AAAHJD
KKPPPP
NAUJKS

AAAHJD & KKPPPP should be removed from this list as obviously they contain AAA and PPPP respectively.

My first attempt at this was to use

grep -v '\([[:alpha:]]\)\1' filename 

but this will only remove Words with 2+ characters the same in a row.

grep -v '\([[:alpha:]][[:alpha:]]\)\1' filename will remove 4+

My knowledge of Awk/Sed is quite weak. Can anyone lend some advise as to where I should look from here?

Regards,
Colin

You almost had it the first time. Try:

grep -v '\([[:alpha:]]\)\1\1' filename

If you need to remove lines with 3 or more occurrences of a character NOT in succession, try

awk '{p=1;for(i=1;i<=length;i++) if(gsub(substr($0,i,1),"&")>=3) {p=0;break}}p' file

This will also remove lines with 3 or more occurrences of a character in succession.

Ambiguous request. The reply I posted assumed you want to delete lines with three adjacent occurrences of a character. The reply elixir_sinari posted assumed you want to delete any line with three occurrences of a character whether or not they are adjacent. The input you gave will give the same results for either interpretation. What was it that you wanted?

That approach isn't very robust. The first argument to gsub is an extended regular expression. If the line contains a . , it will match every character. If there's a ? , + , * , or some other metacharacter, there may be a runtime regular expression compilation failure.

What you're attempting can be done easily with grep and a single regular expression:

grep -v '\(.\).*\1.*\1' file

Regards,
Alister

I did foresee that possibility while writing the solution. But, I assumed that only alphabets will be in the file.

Looking at the first post, that seems a reasonable assumption given the sample data and the use of the [:alpha:] class.

I'll leave my post as is just in case it's of any use (as I'm sure you know, sometimes the sample data isn't representative).

Regards,
Alister

I agree with you, alister. That grep with backrefs is much better suited.