Grep: check if a string comes up twice

tons92 · June 14, 2017, 4:45pm

I have the following files
list.txt

string1<TAB>ABC
string2<TAB>DEF
string3<TAB>GHI

query.txt

ABC
DEF
GHI
ABC

Now I want to check, if a string in the first column of list.txt is twice in query.txt

so my command is:

while IFS=$'\t' read k v ; do  if (($(grep -i '$v' query.txt | wc -l)>=2)); then echo "$v more than 2"; fi; done<list.txt

but nothing is returned here, so where is the error?

rdrtx1 · June 14, 2017, 5:43pm

while IFS=$(printf "\t") read k v ; do if (($(grep -i "$v" query.txt | wc -l) >= 2)); then echo "$v more than 2"; fi; done < list.txt

rbatte1 · June 15, 2017, 7:10am

Beware that for large input files, you are reading the entire list.txt for every entry in query.txt

There may be better ways to approach this, but it depends how much you need to know. If you just need to know that ABC (or whatever) has been repeated and you don't care what the first string part is, then you might be better with this:-

cut -f2 -d"$(printf "\t")" list.txt | sort | uniq -c | grep -f query.txt | grep -Ev "^      1 "

It is a few pipes, but will read list.txt once.

If you then need to go back and get the leading string information, we can probably work on that.

So,

Does that help?
Does it confuse?
Is it irrelevant because the files are trivial?
Does anyone else have a better way? I'm open to suggestions too!

I hope that this helps,
Robin