Create table based on matched patterns

hi,

i need help to create a table from an input file like this:-

DB|QZX3  140  165  RT_2   VgGIGvGVR
DB|QZX3  155  182  UT_1   rlgslqqLaIvlGiFT
DB|QZX3  345  362  RT_1   GRKpllligS
DB|ZXK6  174  199  RT_2  IstvtvptYlgEiatvkaR
DB|ZXK6  189  216  UT_1    algtiyqLfLviGiLF
DB|AZ264  15  17    RT_2  getapvYlaEmspasiR
DB|A1Z8N1  457  474  RT_1  GGPLIEYLGRRntilatA
DB|A1Z8N1  499  524 RT_2  LaGFCvGIaslsqpevR
DB|A1Z8N1  690  706  RT_1  GIVLIDKillyv.S
DB|A3M0N3  133  158  RT_2  LaGLGvGLiR
DB|A3M0N3  334  351  RT_1  GIGRRklllggS

The ouput file should be in a table like this:-

ID                  RT_1                                 RT_2                                        UT1
DB|QZX3        G R K p l l l i g S                    V g G I G v G V R                             r l g s l q q L a I v l G i F T
DB|ZXK6                                               I s t v t v p t Y l g E i a t v k a R         a l g t i y q L f L v i G i L F
DB|AZ264                                              g e t a p v Y l a E m s p a s i R
DB|A1Z8N1      G G P L I E Y L G R R n t i l a t A
DB|A1Z8N1      G I V L I D K i l l y v . S
DB|A3M0N3      G I G R R k l l l g g S                L a G L G v G L i R

as you can see above, there are 4 situations:
1) values in $4 should be the header after $1
2) those ids without any values in it should be left blank
3) same id with different values should be printed separately.
4) each characters in $5 in input file need to be separated

i have thousands of data like this that i need to arrange. I used "paste" but the result is not neat and does not display exactly what i want. Please help me how to do this in awk if possible. thanks

Are RT_1, RT_2, UT1 the only values that can appear in column 4?

Hi,
Yes, only these 3 will appear on 4th column

I can see 5 distinct values in column 4 though:

RT1
RT2
RT_1
RT_2
UT1

Hi,
Sorry, it was typo. It should be RT_1, RT_2 and UT_1

Put this into "script.pl":

#!/usr/bin/perl
use strict;
use warnings;

open my $input, "<", "$ARGV[0]" or die "cannot open file: $ARGV[0]";

my %output;
while (my $line = <$input>) {
  chomp $line;
  my @F = split / +/, $line;
  $output{$F[0]} = " " x 120 if !$output{$F[0]};
  substr($output{$F[0]}, 0, 40) = sprintf "%-40s", join " ", split //, $F[4] if $F[3] eq "RT_1";
  substr($output{$F[0]}, 40, 40) = sprintf "%-40s", join " ", split //, $F[4] if $F[3] eq "RT_2";
  substr($output{$F[0]}, 80, 40) = sprintf "%-40s", join " ", split //, $F[4] if $F[3] eq "UT_1";
}

print "ID" . " " x 18 . "RT_1" . " " x 34 . "RT_2" . " " x 41 . "UT_1\n";
foreach my $id (keys %output) {
  printf "%-15s%s\n", $id, $output{$id};
}

Then run:

./script.pl input

Try also:

awk     '       {LN[$1]; HD[$4]; gsub (/./, "& ", $5); MX[$1,$4]=$5}
         END    {FMT="%-35s"
                                printf FMT, "ID"; for (i in HD) printf FMT, i; print "";
                 for (j in LN) {printf FMT, j;    for (i in HD) printf FMT, MX[j,i]; print ""}
                }
        ' file
ID                                 UT_1                               RT_1                               RT_2                               
DB|A1Z8N1                                                             G I V L I D K i l l y v . S        L a G F C v G I a s l s q p e v R  
DB|AZ264                                                                                                 g e t a p v Y l a E m s p a s i R  
DB|A3M0N3                                                             G I G R R k l l l g g S            L a G L G v G L i R                
DB|QZX3                            r l g s l q q L a I v l G i F T    G R K p l l l i g S                V g G I G v G V R                  
DB|ZXK6                            a l g t i y q L f L v i G i L F                                       I s t v t v p t Y l g E i a t v k a R 

Hi bartus11,

Sorry for taking so long to reply. i tried your code and it does work!!. I just need to do some small changes as some of my data on $5 really long. Thanks a ton!!! :slight_smile:

Hi RudyC,
tried your code too, but there is some error. am looking into it to see where i missed or etc. Will update u.. :wink:

---------- Post updated at 05:00 PM ---------- Previous update was at 04:54 PM ----------

Hi RudyC,

It worked!!! It was my mistake for the error and your code is perfect! Thanks a zillion.. :slight_smile: