grep -f CPU performances

tafil · November 26, 2008, 3:28pm

Hi

I would like to thank you all for this excellent forum.
Today i tried to compare two files and i get some problem with it.
I have two files and i want to get all the data that match the first file like this

File1 (pattern file)
___________________________
9007
9126
9918
9127
9977
___________________________

File 2
_______________________________
9124 2008-12-11 16:00:00
4963 2007-12-16 17:00:00
9126 2006-11-11 16:00:00
9127 2007-12-10 17:00:00
3912 2008-10-11 18:00:00
______________________________

This is how the output file should be
________________________________
9127 2007-12-10 17:00:00
9126 2006-11-11 16:00:00
________________________________

The first file has more than 50000 line and the second file has more than 600000 lines.
I used " grep -f file1 file2 > output.file "
but this take to long I let it running at my Intel@2x1.8GHz(processor load 100% by grep) for 3 hour but i don't get any results.

I also tried to split the first file (pattern file) into smaller parts, but again no results after 3 hours waiting.
this is the script that i used to split the file and to "grep -f"
_________________________________________________
split -l 100 file1 file1.split.
for CHUNK in file1.split.* ; do
grep -f "$CHUNK" file2
done
rm file1.split.*
_________________________________________________

Does someone know how i can do that faster or does anyone has an idea how it can be done faster?

Thanks in advance.

demwz · November 27, 2008, 3:32am

i woul suggest to use a database.
create 2 tables and

select * from table2 where table1.id=table2.id;

tafil · November 27, 2008, 4:27am

thanks your for your answer but i don't have any experience with DB's i never used them?
Do you have any guide how to create DB and Oracle or SQL ? how to import the files into the DB etc..
Some stuff that i can use to do that.

Thanks

demwz · November 27, 2008, 4:55am

i think mysql should be sufficient. i'cannot provide a detailed howto, but here are the main steps you have to do

install an start mysql server
set root password - mysqladmin -u root -p
create 1 database with 2 tables
use load_from_file function to import data
use select statement to process data

maybe you find someone in a mysql forum who can explain this in detail by heart. i also would have to consult documentation. But U really should use a database.

manikantants · November 27, 2008, 6:06am

awk '{ if (NR==FNR) { my_array[$1]=$1; next;} if ( $1 in my_array ) {print $0}}' file1 file2

try above one liner. Not sure about the performance.

tafil · November 27, 2008, 6:53am

Thanks a lot i will try that today

tafil · November 27, 2008, 8:40am

Thanks manikantants

awk '{ if (NR==FNR) { my_array[$1]=$1; next;} if ( $1 in my_array ) {print $0}}' file1 file2

unbelievable it takes only 10 seconds