Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello,

I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file.

For example:

File 1 has 1411 rows, I ignore how many columns it has (thousands)
File 2 has 311 rows, 1 column

Would like to create

File 3 with 311 rows (thousands of columns)

What is the fastest way to do this without consuming too much memory?

Thank you!

Fastest way is syncsort but i dont know if you would have that....
then try grep. dont use awk.

I used this:

grep -A1 -A1 -f file1.txt file2 > file3

but it is taking forever and I don't know if it is going to be correct at the end
I don't know what -A1 -A1 mean (I'm assuming that is col1 File1 col1 File2)

Help please!

give some sample input of both the files
and desired output, and
conditions how the two files will be joined.
PS:- Use code tags

Both files have no headings

input of file 1 (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

input of file 2 (this file has 2,498,588 columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411

desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I just checked the results I obtained with grep -A1 -A1 -f file1.txt file2 > file3

and they are wrong. Instead of getting only 364 rows, I get 367 and some of the ids of file 1 are missing in the output file 3. I want to match the ids from file1 (my "golden" list) in file2 and output that in file 3

Both files have no headings

input of file1.txt (has one 1 column, as shown below):

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

input of file2.ped (this file has more than 2 million columns with single digit numbers, starting with column 1 as shown below, each column is separated by a space)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364
.
.
.
.
.
.
.
.
.
.
.
MXY9423 <--- row #1411

desired output file 3 (with only #364 rows with the ids matched between file1 and file2 and 2,498,588 columns)

MXY2344
MXY2455
.
.
.
.
.
.
.
MXY9150 <--- row #364

Thank you for any help!

---------- Post updated at 11:10 PM ---------- Previous update was at 11:03 PM ----------

I used grep -A1 -A1 -f file1.txt file2 > file3 but that did not work.

I only got one reply for this thread yesterday saying to use grep, so that's why I'm posting this again in hopes somebody would help.

Thank you!

If you want to grep the data from file2 which are present in file1

grep -f file1 file2 > file3
or
awk 'FILENAME=="file1"{A[$0]=$0}
FILENAME=="file2"{if(A[$1]==$1){print}}' file1 file2 > file3

I already tried that grep code and did not work either. I don't know if it is because of the file extension of file2.ped (ped is text file that can handle millions of columns)

---------- Post updated at 08:13 PM ---------- Previous update was at 05:41 PM ----------

The memory is exhausted when using these command lines.