Check if a string exists in a file

kraljic · July 23, 2015, 11:52am

bash in RHEL 6.3

I have these 2 files with strings.

$ cat someStrings.txt
LOGICAL1
HUNGARY2
PENGUIN2
MOBILE
GUITAR1
MOUSE1
$

$ cat checkIF.txt
PENGUIN
MOBILE
$

I need to search for strings in someStrings.txt file that matches the patterns in checkIF.txt file.
The strings in someStrings.txt sometimes end with a number . But, during pattern match, this number should be ignored.
If there is at least 1 match, just print PENGUIN exists in someStrings.txt file .

Logic will look like

if
grep -c PENGUIN someStrings.txt > 0
then print 'PENGUIN exists in someStrings.txt file'
then proceed with pattern search in the file
fi

So, with the sample data given above, my expected output will be

PENGUIN exists in someStrings.txt file ( PENGUIN2 exists in someStrings.txt file is fine too )
MOBILE exists in someStrings.txt file

Any idea how I could do this ?

RavinderSingh13 · July 23, 2015, 12:02pm

Hello kraljic,

Could you please try following and let me know if this helps.

awk 'FNR==NR{gsub(/[[:digit:]]/,X,$0);A[$0];next} ($1 in A){print $1 " exists in file " ARGV[1]}' somestrings.txt checkIF.txt

Output will be as follows.

PENGUIN exists in file somestrings.txt
MOBILE exists in file somestrings.txt

Thanks,
R. Singh

RudiC · July 23, 2015, 12:04pm

If the -H option for grep is available on your system, try

grep -Hf checkIF.txt someStrings.txt

kraljic · July 23, 2015, 12:27pm

Thank you Ravinder, Rudic

One last question : How can I grep only those lines that end with a number ?
ie. in the below file, all lines except the string MOBILE should be returned.

$ cat someStrings.txt
LOGICAL1
HUNGARY2
PENGUIN2
MOBILE
GUITAR1
MOUSE1

$

RavinderSingh13 · July 23, 2015, 1:18pm

Hello kraljic,

Could you please try following and let me know if this helps. Sorry didn't test it though because I am travelling as of now, hope this helps.

awk '($NF ~ /[[:digit:]]$/)' Input_file

Thanks,
R. Singh

Don_Cragun · July 23, 2015, 3:04pm

Or:

grep '[[:digit:]]$' someStrings.txt

Scrutinizer · July 23, 2015, 6:28pm

ravindersingh13:

Hello kraljic,

Could you please try following and let me know if this helps.
awk 'FNR==NR{gsub(/[[:digit:]]/,X,$0);A[$0];next} ($1 in A){print $1 " exists in file " ARGV[1]}' somestrings.txt checkIF.txt
 
Output will be as follows.
PENGUIN exists in file somestrings.txt
MOBILE exists in file somestrings.txt
 
Thanks,
R. Singh

Note: Even though this will work with the sample at hand, since it is only the last digits that need to be ignored, an anchor ( $ )should be used. Also, to ignore any kind of spaces and make it more robust, $1 should be used rather than $0, so instead of

gsub(/[[:digit:]]/,X,$0)

try:

sub(/[[:digit:]]+$/,X,$1)

Don_Cragun · July 23, 2015, 10:21pm

scrutinizer:

Note: Even though this will work with the sample at hand, since it is only the last digits that need to be ignored, an anchor ( $ )should be used. Also, to ignore any kind of spaces and make it more robust, $1 should be used rather than $0, so instead of
gsub(/[[:digit:]]/,X,$0)
try:
sub(/[[:digit:]]+$/,X,$1)

I would have thought (with an example including HUNGARY1 ), that there could also be entries like NEW ZEALAND7 . In which case, $0 would be better than $1 .

The samples provided all had single decimal digit numbers, but looking back at the original problem statement: "The strings in someStrings.txt sometimes end with a number ." (not end with digit), you may be correct in interpreting it as non-negative decimal number). Of course, the intent could even be that any string that would be treated as a number when scanned by strtod() should be considered a number. Without a clearer definition of terms by kraljic, we're just guessing at what regular expression is needed.