Dear awk users,
I am trying to use awk to match records across two moderately large CSV files. File1 is a pattern file with 173,200 lines, many of which are repeated. The order in which these lines are displayed is important, and I would like to preserve it. File2 is a data file with 456,000 unique lines.
File1.csv:
_Year01
23_01_192001
23_02_192001
23_01_192001
23_04_192001
23_05_192001
23_03_192001
23_02_192001
23_03_192001
23_05_192001
23_04_192001
23_05_192001
23_05_192001
23_06_192001
_192001
25_01_192001
25_02_192001
...
File2.csv:
23,01,192001,0.09,23.40,-0.79,0.,1252.,23_01_192001
23,03,192001,0.79,28.30,-0.63,0.,1110.,23_03_192001
23,04,192002,0.15,37.40,-0.98,0.,748.,23_04_192002
23,06,192002,1.42,38.70,2.78,0.,720.,23_06_192002
23,03,192002,0.54,34.30,-1.05,0.,832.,23_03_192002
23,02,192002,0.54,31.50,-1.04,0.,918.,23_02_192002
23,01,192002,0.77,30.60,-0.82,0.,935.,23_01_192002
23,05,192002,0.65,36.30,-1.00,0.,784.,23_05_192002
23,04,192003,5.18,45.10,0.58,6.,595.,23_04_192003
23,02,192003,5.24,42.30,0.94,0.,682.,23_02_192003
I want to extract the lines in File2 for which column 9 matches the alphanumeric key in File 1, preserving the key order in the latter file. After reading several posts on this and other forums, I tried:
awk -F\, 'FNR==NR{a[$1]=$9;next}{print $0, a[$1]}' File1.csv File2.csv
which just prints out File2.csv, and
awk -F\, 'FNR==NR{a[$1]=$9;next}$9 in a' File1.csv File2.csv
which generates results in the wrong order.
I realize this type of question is a simple and well-documented. I thought I had correctly grasped the logic of the program, but apparently I have not. I would be very grateful for pointers as to where I am going wrong.
-i