Find and sort by first column value

Hi,
I have two text files
file 1 with N lines

AAAAA	2.092290E-12
BBBBB	1.727740E-07
CCCCC	9.608710E-17
DDDDD	0.000000E+00
EEEEE	0.000000E+00
FFFFF	0.000000E+00
GGGGG	0.000000E+00
HHHHH	0.000000E+00
IIIII	3.300320E-04
...

The text in the first column is unique for each row and alphabetically sorted A->Z.

file 2 with M lines (M>=N)

AAAAA	text1	5.07822E-02
DDDDD	text2	8.45965E-03
CCCCC	text3	4.33704E-03
BBBBB	text4	0.00000E+00
EEEEE	text3	5.05173E+00
GGGGG	text4	2.83088E-03
...

The text in the first column is unique for each row.

What I would like to obtain is file 3 containing only the rows of file 2 with an "column1 entry" in file 1 and sorted as they appear in file 1.
If the entry is not present I would like to have a warning message (as below).

file 3 with N lines

AAAAA	text1	5.07822E-02
BBBBB	text4	0.00000E+00
CCCCC	text3	4.33704E-03
DDDDD	text2	8.45965E-03
EEEEE	text3	5.05173E+00
FFFFF	NOT	FOUND
...

Do you have any suggestion?

Many thanks,

Hello f_o_555,

Could you please try following and let me know if this helps.

 awk 'FNR==NR{X[$1]=$0;next} ($1 in X){print X[$1]} !($1 in X){print $1 OFS "NOT" OFS "FOUND."}' OFS="\t" file2 file1

Output will be as follows.

AAAAA   text1   5.07822E-02
BBBBB   text4   0.00000E+00
CCCCC   text3   4.33704E-03
DDDDD   text2   8.45965E-03
EEEEE   text3   5.05173E+00
FFFFF   NOT     FOUND.
GGGGG   text4   2.83088E-03
HHHHH   NOT     FOUND.
IIIII   NOT     FOUND.

Thanks,
R. Singh

1 Like

Thank you, Ravinder, it works, but not for all files.
Sometimes I get only

AAAAA
BBBBB
CCCCC
DDDDD
EEEEE

It may be an issue with the formatting, which I'm currently investigating. Although there is no evident difference...

Hello f_o_555,

I am not sure how you have tried running command with other files, but please make sure about command like first file should have 3 fields and 2nd passed file should have 2 fields etc to give you the requested output.
Like in following example.

awk 'FNR==NR{X[$1]=$0;next} ($1 in X){print X[$1]} !($1 in X){print $1 OFS "NOT" OFS "FOUND."}' OFS="\t" file2 file1

Where file2 has 3 fields and file1 has 2 fields. Let me know if you have any queries and post the error with complete input please incase you have queries, will try to fix the same.

Thanks,
R. Singh

1 Like

It seems to work now...I'll keep an eye and see if error appears. Thanks!