dovah
July 16, 2014, 2:15pm
1
I dispose of two tab-delimited files (the first column is the primary key):
File 1 (there are multiple rows sharing the same key, I cannot merge them)
A 28,29,30,31
A 17,18,19
B 11,13,14,15
B 8,9
File 2 (there is one only row beginning with a given key)
A 2,8,18,30,31
B 3,11
I'd like to put a star symbol (tab-separated) in File 1 if there is a corresponding element in the second column of File 2.
The output should look like:
A 28,29,30,31 **
A 17,18,19 *
B 11,13,14,15 *
B 8,9
I'm trying an awk solution, but I cannot find my way out. Please let me know if you have an idea of how I could deal with this issue.
RudiC
July 16, 2014, 2:27pm
2
Please show us your awk approach.
dovah
July 16, 2014, 5:43pm
3
Something like this. But it really need a fix, it doesn't give the expected output.
PRE.cjk \{ font-family: "WenQuanYi Micro Hei",monospace; \}PRE.ctl \{ font-family: "Lohit Hindi",monospace; \}P \{ margin-bottom: 0.1in; line-height: 120%; \}CODE.cjk \{ font-family: "WenQuanYi Micro Hei",monospace; \}CODE.ctl \{ font-family: "Lohit Hindi",monospace; \}A:link \{ \}
$ awk ' FNR == NR { a[$1] = $2; next; } { split($2,b,","); split(a[$1],c,","); for (i in b) { if (b in c) { printf("%s %s\t*\n",$1,a[$1]);next; }} print $1, a[$1]; } ' file1 file2
Thanks.
You were on the right track. Here is an approach with two-dimensional arrays :
awk '{split($2,F,/,/)} NR==FNR{for(i in F) A[$1,F]; next} {for(i in F) if(($1,F) in A) $3=$3 "*"}1' FS='\t' OFS='\t' file2 file1
or
awk '{split($2,F,/,/); for(i in F) if(NR==FNR){A[$1,F]} else if(($1,F) in A) $3=$3 "*"}NR>FNR' FS='\t' OFS='\t' file2 file1
1 Like