Hi, I have the following files,
A M 2 3
B E 4 5
C I 5 6
D O 4 5
A M 3 4
B E 5 2
F U 7 9
J K 2 3
OUTPUT
A M 2 3 3 4
B E 4 5 5 2
thanks in advance,
Hi, I have the following files,
A M 2 3
B E 4 5
C I 5 6
D O 4 5
A M 3 4
B E 5 2
F U 7 9
J K 2 3
OUTPUT
A M 2 3 3 4
B E 4 5 5 2
thanks in advance,
awk 'NR==FNR{a[$1"-"$2]=$0;next}$1"-"$2 in a{print a[$1"-"$2],$3,$4}' file1 file2
Hi,
If I use the same command,
awk 'NR==FNR{a[$1"-"$2]=$0;next}$1"-"$2 in a{print a[$1"-"$2],$3,$4}' file1 file2
I am getting around 293 records. But when I do it
awk 'NR==FNR{a[$1"-"$2]=$0;next}$1"-"$2 in a{print a[$1"-"$2],$3,$4}' file2 file1
I am having around 370 records.
My file1 has 8219 records and file2 has 762 records.
Post some of your actual input data instead of a mockup sample, it may be different than you expected.
FILE1
0610009B14Rik NR_037995 38 0
0610040J01Rik NM_029554 21 0
1110012J17Rik NM_001114098 394 0
1110017D15Rik NM_001048005 95 0
1110032A04Rik NM_001164210 147 0
1110059M19Rik NM_026841 53 0
1190003J15Rik NM_029821 40 0
1300014I06Rik NM_025831 56 0
1300017J02Rik NM_027918 3 0
1500009C09Rik NR_037698 828 0
1500015O10Rik NM_024283 366 0
1500016L03Rik NR_038057 414 0
1600029D21Rik NM_029639 15 0
1600029I14Rik NR_028123 10 0
1700001C02Rik NM_029285 24 0
1700001G11Rik NR_038077 1 0
1700001L19Rik NM_027035 406 0
1700003E16Rik NM_027948 27 0
1700003M02Rik NM_027041 2 0
1700007K13Rik NM_027040 26 0
1700009J07Rik NR_015547 4 0
FILE2
0610010O12Rik NM_001081365 0 1
1300017J02Rik NM_027918 0 17
1500015O10Rik NM_024283 0 1
1700003G18Rik NR_029433 0 1
1700011H14Rik NM_025956 0 2
1700016D06Rik NM_024271 0 3
1700047M11Rik NR_015458 0 7
1700061J05Rik NM_028522 0 1
1810010H24Rik NM_001163473 0 4
2010005H15Rik NM_029733 0 4
2010107G23Rik NM_027251 0 23
2200002K05Rik NM_026955 0 15
2310005G13Rik NM_183281 0 6
2510049J12Rik NM_001101431 0 12
2610034M16Rik NM_027001 0 10
2610528J11Rik NM_025572 0 6
4632428C04Rik NR_033631 0 2
4930412F15Rik NM_175517 0 4
4930511M11Rik NM_029141 0 9
4930528F23Rik NM_029197 0 9
4930555I21Rik NM_030189 0 1
4930579C15Rik NM_027089 0 1
4930579G22Rik NM_026916 0 1
4931428L18Rik NR_033445 0 4
And for those two sample files my code is outputting two lines regardless of whether file1 is first or not. Can you post some sample data for which my code is not working?
I am not sure which part of the input files are being read by your solution.
My files have around 8K records which is out of bound to be posted.
Thanks anyways.
Hi.
Using join (with the help of sed and sort) I also get 2 lines:
1300017J02Rik NM_027918 3 0 0 17
1500015O10Rik NM_024283 366 0 0 1
although I don't know if they are the same as bartus11 got, nor if they are indeed correct because no one posted any results for the second sets, expected or obtained.
I note that your first data sets had space delimiters, and the second sets had TABs ... cheers, drl