Merging and Adding colon to columns

Hello,
I have a tab delim file that looks like this

CHROM    POS    ID    REF    ALT    ID    HGVS_C    HGVS_P
1    17319011    rs2076603    G    A    NM_022089.3,NM_001141973.2,NM_001141974.2    c.1815C>T,c.1800C>T,c.1800C>T    p.Pro605Pro,p.Pro600Pro,p.Pro600Pro
1    20960230    rs45530340    C    T    NM_032409.2,NR_106732.1    c.189C>T,n.59C>T    p.Leu63Leu,.
1    20964328    rs2298298    A    G    NM_032409.2,NR_106732.1,NR_046507.1    c.388-7A>G,n.*4047A>G,n.*4822T>C    .,.,.
1    20972048    rs3131713    G    A    NM_032409.2,NR_046507.1    c.960-5G>A,n.3981+30C>T    .,.
1    43395635    rs2229682    C    T    NM_006516.2    c.588G>A    p.Pro196Pro
1    43396414    rs11537641    G    A    NM_006516.2    c.399C>T    p.Cys133Cys
1    43408966    rs1385129    G    A    NM_006516.2    c.45C>T    p.Ala15Ala

I need the output file to look like this where ID column is merged with the last two. The columns do not have a consistent number of values in it, sometimes its just a single value, sometimes many and sometimes it's a blank.

CHROM    POS    ID    REF    ALT    ID:HGVS_C    ID:HGVS_P
1    17319011    rs2076603    G    A    NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T    NM_022089.3:p.Pro605Pro,NM_001141973.2:p.Pro600Pro,NM_001141974.2:p.Pro600Pro
1    20960230    rs45530340    C    T    NM_032409.2:c.189C>T,NR_106732.1:n.59C>T    NM_032409.2:p.Leu63Leu,.
1    20964328    rs2298298    A    G    NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C    
1    20972048    rs3131713    G    A    NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T    
1    43395635    rs2229682    C    T    NM_006516.2:c.588G>A    NM_006516.2:p.Pro196Pro
1    43396414    rs11537641    G    A    NM_006516.2:c.399C>T    NM_006516.2:p.Cys133Cys
1    43408966    rs1385129    G    A    NM_006516.2:c.45C>T    NM_006516.2:p.Ala15Ala

I tried this

awk '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6":"$7"\t"$6":"$8}' input.txt > output.txt

It works for single values (ie last 3-4 rows) but I am not able to merge multiple values together (first few rows).

Any help/suggestions are appreciated.

thank you

Like so:

awk '
        {n = split ($6, T, ",")
         m = split ($7, V, ",")
         l = split ($8, W, ",")
         $6 = $7 = $8 = ""
         for (i=1; i<=n; i++)   {$6 = $6 T ":" V (i==n?"":",")
                                 $7 = $7 T ":" W (i==n?"":",")
                                }
        }
1
' OFS="\t" file
CHROM	POS	ID	REF	ALT	ID:HGVS_C	ID:HGVS_P	
1	17319011	rs2076603	G	A	NM_022089.3:c.1815C>T,NM_001141973.2:c.1800C>T,NM_001141974.2:c.1800C>T	NM_022089.3:p.Pro605Pro,NM_001141973.2:p.Pro600Pro,NM_001141974.2:p.Pro600Pro	
1	20960230	rs45530340	C	T	NM_032409.2:c.189C>T,NR_106732.1:n.59C>T	NM_032409.2:p.Leu63Leu,NR_106732.1:.	
1	20964328	rs2298298	A	G	NM_032409.2:c.388-7A>G,NR_106732.1:n.*4047A>G,NR_046507.1:n.*4822T>C	NM_032409.2:.,NR_106732.1:.,NR_046507.1:.	
1	20972048	rs3131713	G	A	NM_032409.2:c.960-5G>A,NR_046507.1:n.3981+30C>T	NM_032409.2:.,NR_046507.1:.	
1	43395635	rs2229682	C	T	NM_006516.2:c.588G>A	NM_006516.2:p.Pro196Pro	
1	43396414	rs11537641	G	A	NM_006516.2:c.399C>T	NM_006516.2:p.Cys133Cys	
1	43408966	rs1385129	G	A	NM_006516.2:c.45C>T	NM_006516.2:p.Ala15Ala	

?

1 Like

Hi, try:

awk '
  {
    n=split($6,F,/,/)
    split($7,G,/,/)
    split($8,H,/,/)
    $6=$7=$8=""
    for(i=1; i<=n; i++) { 
      s=(i>1)?",":""
      $6=$6 s F ":" G
      $7=$7 s F ":" H
    } 
    print $1,$2,$3,$4,$5,$6,$7
  }
' FS='\t' OFS='\t' file
1 Like

Thank you so much. Both the solutions worked!