baris35
December 11, 2018, 9:59am
1
Hello Folks,
What I wish to do is:
If first column matches in main and new file, then
paste $COL2
into output file. Something like vlookup
. Please see also bold text in expected output.
mainfile
11
22
33
44
55
66
77
88
99
100
101
102
newfile1
11 MONDAY24
22 WEDNESDAY42
33 THURSDAY52
44 FRIDAY62
55 TUESDAY72
66 FRIDAY99
51 WHAT01
77 SATURDAY22
88 SUNDAY22
newfile2
11 MONDAY
22 WEDNESDAY
33 THURSDAY
44 FRIDAY
55 TUESDAY
66 FRIDAY
51 WHAT
77 SATURDAY
88 SUNDAY
101 MONTAG
109 DIENSTAG
115 MITTWOCH
55 DONNERSTAG
Expected output:
11 MONDAY24 MONDAY
22 WEDNESDAY42 WEDNESDAY
33 THURSDAY52 THURSDAY
44 FRIDAY62 FRIDAY
55 TUESDAY72 TUESDAY DONNERSTAG
66 FRIDAY99 FRIDAY
77 SATURDAY22 SATURDAY
88 SUNDAY22 SUNDAY
99
100
101 MONTAG
102
I run awk
code with two steps:
awk 'NR==FNR {m[$1]=$1"\t"$2; next} {$1=m[$1]; print}' mainfile newfile1 > output
awk 'NR==FNR {m[$1]=$1"\t"$2; next} {$1=m[$1]; print}' mainfile newfile2 >> output
This one does not works as expected.
Could you please let me know how to solve this?
Awk
is not a must. Easily understandable solutions are welcome.
Thanks
Boris
awk -f baris.awk mainfile newfile1 newfile2
where baris.awk
is:
FNR==NR { a[$1];next}
$1 in a {a[$1]=a[$1] OFS $2}
END {
for (i in a)
print i,a
}
1 Like
baris35
December 11, 2018, 10:46am
3
Thanks Vgersh99,
Gives expected output.
Kind regards
Boris
baris35
December 14, 2018, 8:04am
4
Dear Vgersh99,
One note:
I do not understand why output is not in comply with the list of order in mainfile when I change newfile1 and newfile2 as follows:
mainfile
AA_11
BB_22
CC_33
DD_44
EE_55
FF_66
GG_77
HH_88
II_99
JJ_100
KK_101
LL_102
newfile1
AA_11 MONDAY24
BB_22 WEDNESDAY42
CC_33 THURSDAY52
DD_44 FRIDAY62
EE_55 TUESDAY72
FF_66 FRIDAY99
GG_51 WHAT01
HH_77 SATURDAY22
II_88 SUNDAY22
newfile2
AA_11 MONDAY
BB_22 WEDNESDAY
CC_33 THURSDAY
DD_44 FRIDAY
EE_55 TUESDAY
FF_66 FRIDAY
GG_51 WHAT
HH_77 SATURDAY
II_88 SUNDAY
JJ_101 MONTAG
KK_109 DIENSTAG
LL115 MITTWOCH
MM_55 DONNERSTAG
This way, output is:
AA_11 MONDAY24 MONDAY
II_99
KK_101
GG_77
HH_88
FF_66 FRIDAY99 FRIDAY
DD_44 FRIDAY62 FRIDAY
EE_55 TUESDAY72 TUESDAY
BB_22 WEDNESDAY42 WEDNESDAY
JJ_100
CC_33 THURSDAY52 THURSDAY
LL_102
I was expecting:
AA_11 MONDAY24 MONDAY
BB_22 WEDNESDAY42 WEDNESDAY
CC_33 THURSDAY52 THURSDAY
DD_44 FRIDAY62 FRIDAY
EE_55 TUESDAY72 TUESDAY
FF_66 FRIDAY99 FRIDAY
GG_77
HH_88
II_99
JJ_100
KK_101
LL_102
Thank you
Boris
--- Post updated at 08:04 AM ---
Hello,
Have just sorted out.
awk 'FNR == NR { lineno[$1] = NR; next} {print lineno[$1], $0;}' mainfile outputfile | sort -k 1,1n | cut -d' ' -f2-
Many thanks
Boris
baris35:
Dear Vgersh99,
One note:
I do not understand why output is not in comply with the list of order in mainfile when I change newfile1 and newfile2 as follows:
mainfile
AA_11
BB_22
CC_33
DD_44
EE_55
FF_66
GG_77
HH_88
II_99
JJ_100
KK_101
LL_102
newfile1
AA_11 MONDAY24
BB_22 WEDNESDAY42
CC_33 THURSDAY52
DD_44 FRIDAY62
EE_55 TUESDAY72
FF_66 FRIDAY99
GG_51 WHAT01
HH_77 SATURDAY22
II_88 SUNDAY22
newfile2
AA_11 MONDAY
BB_22 WEDNESDAY
CC_33 THURSDAY
DD_44 FRIDAY
EE_55 TUESDAY
FF_66 FRIDAY
GG_51 WHAT
HH_77 SATURDAY
II_88 SUNDAY
JJ_101 MONTAG
KK_109 DIENSTAG
LL115 MITTWOCH
MM_55 DONNERSTAG
This way, output is:
AA_11 MONDAY24 MONDAY
II_99
KK_101
GG_77
HH_88
FF_66 FRIDAY99 FRIDAY
DD_44 FRIDAY62 FRIDAY
EE_55 TUESDAY72 TUESDAY
BB_22 WEDNESDAY42 WEDNESDAY
JJ_100
CC_33 THURSDAY52 THURSDAY
LL_102
I was expecting:
AA_11 MONDAY24 MONDAY
BB_22 WEDNESDAY42 WEDNESDAY
CC_33 THURSDAY52 THURSDAY
DD_44 FRIDAY62 FRIDAY
EE_55 TUESDAY72 TUESDAY
FF_66 FRIDAY99 FRIDAY
GG_77
HH_88
II_99
JJ_100
KK_101
LL_102
Thank you
Boris
--- Post updated at 08:04 AM ---
Hello,
Have just sorted out.
awk 'FNR == NR { lineno[$1] = NR; next} {print lineno[$1], $0;}' mainfile outputfile | sort -k 1,1n | cut -d' ' -f2-
Many thanks
Boris
The order of iteration over the array indicies is not defined/guaranteed.
From gawk manual:
By default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next.
Sorting the output, assumes that the entries in the mainfile
is sorted as well, which might always be true.
A small change to the script guarantees the output in the same order as in he mainfile
.
FNR==NR { a[$1];ord[FNR]=$1;next}
$1 in a {a[$1]=a[$1] OFS $2}
END {
for (i=1;i in ord;i++)
print ord,a[ord]
}
1 Like