Hi,
I need help with following. I need to exclude words that match following patterns
a. more than length 4 (example SBRAP)
b. contains mixture uppercase and lower case regardless of the length (example GSpD)
File contains
COFpC
MCHX
SP
SNFCA
GEH
SBRAP
DGICA
JPMpE
WFCpP
GSpD
AXL
RGS
PREpD
EVN
CNOB
CUBI
TOWN
GSpD
RGS
EVN
CNOB
CUBI
Thanks
jak
Aia
November 4, 2015, 9:07pm
2
perl -nle 'print unless (length > 4 or (/[A-Z]/ and /[a-z]/))' jakSun8.file
MCHX
SP
GEH
AXL
RGS
EVN
CNOB
CUBI
TOWN
RGS
EVN
CNOB
CUBI
jakSun8
November 5, 2015, 11:19am
3
Thanks Aia but i don't have perl installed on my system. Any awk or sed solution please?
Thanks,
jak
RudiC
November 5, 2015, 11:22am
4
Any attempt from your side, please?
awk 'length <=4 && !(/[a-z]/ && /[A-Z]/)' file
or
awk 'length <=4 && !(tolower($0) == $0 || toupper($0) == $0)' file
jakSun8
November 5, 2015, 12:44pm
5
Thanks RudiC. I am not well versed in scripting language but appreciate folks like you who help out generously on daily basis.
Thanks,
jak
Sed version:
--
sed '/.\{5\}/d; /[a-z][A-Z]/d; /[A-Z][a-z]/d' file
or (better)
sed '/.\{5\}/d; /[[:lower:]][[:upper:]]/d; /[[:upper:]][[:lower:]]/d' file
--
If we only want to print either only lower case or only uppercase (not allowing any other characters)
sed -n '/^[A-Z]\{1,4\}$/p; /^[a-z]\{1,4\}$/p' file
or (better)
sed -n '/^[[:upper:]]\{1,4\}$/p; /^[[:lower:]]\{1,4\}$/p' file
or grep:
grep -Ee '^[[:upper:]]{1,4}$' -e '^[[:lower:]]{1,4}$' file