-
if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2
print all relevant values together as defined in the output.
-
A bit tricky part is IDno in the output. All we need to do is numbering same kind of
letters as 1 (aa of ID1) and different letters as 2 (ab of ID1). And 3 is second frequent same kind of letter (bb of ID1)
Hope I made every thing clear.
input1
Column1 ID1 ID2 ID3
a1 a/a b/b c/c
a2 a/a b/b c/c
a3 a/b b/b c/d
a6 a/b b/b c/e
a8 a/a b/c c/e
a9 b/b b/d c/e
input2
IDName Column1 name value
ID1 a1 ppp 10
ID1 a2 ppp 20
ID1 a3 ppp 30
ID1 a4 ppp 40
ID1 a6 ppp 79
ID1 a8 ppp 62
ID2 a9 ppp 12
ID2 a1 qqq 23
ID2 a2 qqq 26
ID2 a3 qqq 28
ID2 a4 qqq 29
ID2 a6 qqq 36
ID2 a8 qqq 46
ID2 a9 qqq 22
ID4 a1 fff 67
output
IDName name Column1 IDs IDNo Value
ID1 ppp a1 a/a 1 10
ID1 ppp a2 a/a 1 20
ID1 ppp a8 a/a 1 62
ID1 ppp a3 a/b 2 30
ID1 ppp a6 a/b 2 79
ID1 ppp a9 b/b 3 12
ID2 qqq a1 b/b 1 23
ID2 qqq a2 b/b 1 26
ID2 qqq a3 b/b 1 28
ID2 qqq a6 b/b 1 36
ID2 qqq a8 b/c 2 46
ID2 qqq a9 b/d 3 22
---------- Post updated at 08:20 PM ---------- Previous update was at 03:49 AM ----------
if IDNames of input1 = IDs of IDNames of input2
if Column1 of input1 = Column1 of input2
print matched-IDName, and it's corresponding name, matched Column1value, and values of IDnames based on Column1value, and Value as last column
if IDs based on column1 values are same letters
print highest frequent letters as 1 in IDNo
print second highest letters as 3 in IDNo
else
print letters as 2 in IDNo
Try the below I hope you like it.
nawk 'NR==1{split($0,a," ");next}
NR>1&&FNR==NR{for (i=2;i<=NF;i++) { b[a$1]=$i };next}
(($1$2) in b){print $1,$3,$2,b[$1$2],$4}
' OFS="\t\t" input1 input2 | sort -k4 | nawk '
BEGIN{print "IDName\t\tname\t\tColumn1\tIDs\tValue\tIDNo" }
{
split($4,_,"/") ;
if (_[1]==_[2]&&$1==p1){c=1;print $0,c;p1=$1 ;next} ; if(_[1]==_[2]&&$1!=p1){c=1;p1=$1;print $0,c;next} ;
if($4!=p){c++;p=$4; print $0,c }else{print $0,c} ;
}
' OFS="\t"
;);)
I really liked it very much.
thanx alot. i thought It will become unreply post. thanx for your time.
actually I divided every thing into smaller tasks and wrote multiple shell scripts. But your single code looks finer than mine.
IDName name Column1 IDs Value IDNo
ID1 ppp a1 a/a 10 1
ID1 ppp a2 a/a 20 1
ID1 ppp a8 a/a 62 1
ID1 ppp a3 a/b 30 2
ID1 ppp a6 a/b 79 2
ID2 qqq a1 b/b 23 1
ID2 qqq a2 b/b 26 1
ID2 qqq a3 b/b 28 1
ID2 qqq a6 b/b 36 1
ID2 qqq a8 b/c 46 2
ID2 ppp a9 b/d 12 3
ID2 qqq a9 b/d 22 3
Could you please take a look at small requirements.
- if the letters are same b/b or d/d it has to be 1 or 3 not 2[Please check the black-bold letters]
- And if any n/n in input1 read it as 4 instead of 1 or 3.[Please check the red-bold letters]
Thanx for your help
ruby
input1
Column1 ID1 ID2 ID3
a1 a/a b/b c/c
a2 a/a b/b c/c
a3 a/b b/b c/d
a6 a/b b/b c/e
a8 a/a b/c c/e
a9 b/b d/d c/e
a10 n/n n/n n/n
input2
IDName Column1 name value
ID1 a1 ppp 10
ID1 a2 ppp 20
ID1 a3 ppp 30
ID1 a4 ppp 40
ID1 a6 ppp 79
ID1 a8 ppp 62
ID1 a10 ppp 99
ID2 a9 ppp 12
ID2 a1 qqq 23
ID2 a2 qqq 26
ID2 a3 qqq 28
ID2 a4 qqq 29
ID2 a6 qqq 36
ID2 a8 qqq 46
ID2 a9 qqq 22
ID4 a1 fff 67
output
IDName name Column1 IDs Value IDNo
ID1 ppp a1 a/a 10 1
ID1 ppp a2 a/a 20 1
ID1 ppp a8 a/a 62 1
ID1 ppp a3 a/b 30 2
ID1 ppp a6 a/b 79 2
ID1 ppp a10 n/n 99 4
ID2 qqq a1 b/b 23 1
ID2 qqq a2 b/b 26 1
ID2 qqq a3 b/b 28 1
ID2 qqq a6 b/b 36 1
ID2 qqq a8 b/c 46 2
ID2 ppp a9 d/d 12 3
ID2 qqq a9 d/d 22 3
after modification use below:-
nawk '
NR==1{split($0,a," ");next}
NR>1&&FNR==NR{for (i=2;i<=NF;i++) { b[a$1]=$i };next}
(($1$2) in b){print $1,$3,$2,b[$1$2],$4}
' OFS="\t\t" input1 input2 | sort -k1,1 -k4,4 | nawk '
BEGIN{print "IDName\t\tname\t\tColumn1\tIDs\tValue\tIDNo" }
{
split($4,_,"/") ;
if (_[1]==_[2]&&$1==p1){if ($4!=p){c++} ;print $0,c;p1=$1;p=$4 ;next} ; if(_[1]==_[2]&&$1!=p1){c=1;p1=$1;p=$4;print $0,c;next} ;
if($4!=p){c++;p=$4; print $0,c }else{print $0,c} ;
} ' OFS="\t"
:D:D:D