Dear all,
Please help with the following.
I have a file, let's call it data.txt, that has 3 columns and approx 700,000 lines, and looks like this:
rs1234 A C
rs1236 T G
rs2345 G T
I have a second file, called reference.txt, which has one column with about 500,000 lines, and contains some, but not all of the values of column 1 in data.txt. e.g.
rs1234
rs2345
...
I want to 'grep' out all the lines in data.txt that have a match in reference.txt, so that I end with:
rs1234 A C
rs2345 G T
I have tried:
cat data.txt | grep -f reference.txt > output.txt
But this was taking far too long.
I therefore thought I might need to loop it using a bash script. I had a go, but got nowhere with the following:
for i in reference.txt; do
grep "$i" data.txt
done
I am sure that this must be quite simple to do, but would be grateful for your help with this.
Thank you,
AB