Find out the match data content?!

Hi,

Long list of Input file1 content:
1285_t
4860_i
4817_v
8288_c
9626_a
.
.
.

Long list of Input file2 content:
1285_t chris germany
8288_c steve england
9626_a dave swiss
9260_s stephanie denmark
.
.
.

Output file:
1285_t chris germany
8288_c steve england
9626_a dave swiss
.
.
.

How I can extract the Input file2 content that got match with Input file1 data?
Thanks a lot for advise :slight_smile:

$ grep -f file1 file2
1285_t chris germany
8288_c steve england
9626_a dave swiss

Hi, do you got any idea if I deal with long list of data ?
For short list of data, I believe "grep" is able to handle it.
But for long list of data, it might be difficult :frowning:
thanks for your advice ^^

With -m, can save the search time.

  -m NUM, --max-count=NUM
         Stop reading a file after NUM matching lines.  If the input is standard input from  a  regular
         file, and NUM matching lines are output, grep ensures that the standard input is positioned to
         just after the last matching line before exiting,  regardless  of  the  presence  of  trailing
         context  lines.  This enables a calling process to resume a search.  When grep stops after NUM
         matching lines, it outputs any trailing context lines.  When the -c or --count option is  also
         used,  grep does not output a count greater than NUM.  When the -v or --invert-match option is
         also used, grep stops after outputting NUM non-matching lines.
grep -m 1 -f file1 file2

what about join

join file1 file2

Hi, I just try the code that you suggested.
Sad to said that it is not worked :frowning:
Do you have better suggestion?
Both of my input file 1 and file 2 got different line number.
I just want to print out those content that file 1 and file 2 match at first column.
Really thanks for your help :slight_smile:

---------- Post updated at 04:01 AM ---------- Previous update was at 03:57 AM ----------

thanks a lot.
Even though take some time to do for huge data :frowning:
It is worked :slight_smile:

apologize, you have to specify the output file like

join file1 file2 > file3

I've tried with your sample data and it outputs
1285_t chris germany
8288_c steve england
9626_a dave swiss

yup. you're right.
Thanks a lot ^^
By using the join, do you got any idea like how to let the output result got tab delimiter in between each line?
I got try to do this by using the awk "\t" for the file3

awk '{print $1"\t",$2"\t",$3"\t"}' file3 > file4

Instead of using awk to generate file4.
Do you have any other suggestion to improve my code by just using join to do it?
Thanks for your suggestion :slight_smile:

try this

cat file3 | tr "\n" "\t" > file4

or more direct

join FILE1.dat FILE2.dat | tr "\n" "\t" > file4

Hi,
I just try both of the code that you suggested.
End up It will link all the data together and generated the output like this:
1285_t chris germany 8288_c steve england 9626_a dave swiss
Do I did anything wrong?
Thanks again, frans :slight_smile:

You did allright :b:
I think your awk script looks good. I couldn't do better

Never mind.
Knowledge is sharing ^^

Something like this :

awk 'NR==FNR{a[$1]=$1;next} { if($1 in a) print }' f1 f2

Not sure of the performance your expecting can be reached or not.

If it is too slow, you could test if awk runs faster. Especially mawk is lightning quick.

awk 'NR==FNR{a[$1]=1;next}a[$1]' file1 file2>file3

---------- Post updated at 03:37 AM ---------- Previous update was at 03:35 AM ----------

Oops somehow missed panyam's answer. Oh well..

thanks a lot. It is worked nice and faster ^^

---------- Post updated at 02:40 AM ---------- Previous update was at 02:40 AM ----------

Thanks again, your suggested code run fastest so far :slight_smile:
congratulation ^^