Arrange word in table metrix format

awil · February 7, 2013, 10:01pm

Hello everyone,

I have some problem about this code :

#!/usr/bin/env python
import sys

try :
   filename = sys.argv[1]
except :
   print 'Specify filename'
   sys.exit()
fd = open(filename)
lines = fd.xreadlines()
compare = {}
for line in lines :
    split_line = line.strip().split('\t')
        if compare.has_key(split_line[0]) :
            for i in range(1,3) :
                try :
                    compare[split_line[0]].index(split_line)
                except :
                    compare[split_line[0]].append(split_line)
        else :
            compare[split_line[0]] = []
            for i in range(1,3) :
                try :
                    compare[split_line[0]].index(split_line)
                except :
                     compare[split_line[0]].append(split_line)
fd.close()
for i in compare.keys() :
   print '%s\t%s' %(i,compare)

Input file is

Output from this code showed :

If I would like get the output file in below :

Expected output should be arranged in data1 to data3 in every row. If any column is loss, the data will show in N,N.

I don't know that how to solve this problem, please suggest me.
Thank in advance.

durden_tyler · February 10, 2013, 10:52am

awil:

...
Input file is
SN1     data1   A,A
SN1     data2   A,B
SN1     data3   A,C
AC2     data1   A,B
AC2     data2   A,C
TP3     data3   C,C
TP3     data1   C,A                      
...
If I would like get the output file in below :
AC2    data1 A,B data2 A,C data3 N,N
TP3     data1 C,A data2 N,N data3 C,C 
SN1     data1 A,A data2 A,B data3 A,C                      
Expected output should be arranged in data1 to data3 in every row. If any column is loss, the data will show in N,N.
I don't know that how to solve this problem, please suggest me.
...

I don't quite understand your implementation in Python, but I do understand your problem. A suggested algorithm is as follows:

(1) Read a line, split it by Tabs and then determine if the first token exists as a hash key.
(2) If it does not, then assign a "template" array that has "N,N"s at all the right places.
    The template array is this: ("data1", "N,N", "data2", "N,N", "data3", "N,N").
(3) From the 2nd token, determine the index of the template array that you want to update.
    So, for example, if the 2nd token is "data1", you extract 1 from it and you know that the "N,N" after "data1" is to be updated.
(4) Update the template array at the relevant index.

I've implemented this algorithm in Perl, and it is posted below.

A few notes:
(1) A hash value is expected to be a scalar in Perl, so you set the "reference" to an array as the hash value. A "reference" in Perl is similar to a "pointer" in C. No clue what it is called in Python, or if you are using that in your code.

(2) Perl has 0-based arrays i.e. the first index of an array is 0. (It appears that Python array indexes start with 1, by looking at your code.) So, once you read the second token, say, "data3", and extract 3 from it, then you'll have to update index 5 (=2*3 - 1) of the template array.

- Read "data1" -> extract 1 -> update index 2*1 - 1 = 1 of array ("data1", "N,N", "data2", "N,N", "data3", "N,N")
- Read "data2" -> extract 2 -> update index 2*2 - 1 = 3 of array ("data1", "N,N", "data2", "N,N", "data3", "N,N")
- Read "data3" -> extract 3 -> update index 2*3 - 1 = 5 of array ("data1", "N,N", "data2", "N,N", "data3", "N,N")

(3) I've adopted this "hard-coded" template array approach because I see this in your code:

which makes me believe that a particular token "SN1", or "AC2" or "TP3" can have at the most three records. If that's not the case, then the problem becomes more interesting!

By this approach, once you are done reading the file, your data structure is ready and you can simply print off the results. I hope the script comments are sufficient.

$
$ # check the data file
$ cat -n test.txt
     1  SN1     data1   A,A
     2  SN1     data2   A,B
     3  SN1     data3   A,C
     4  AC2     data1   A,B
     5  AC2     data2   A,C
     6  TP3     data3   C,C
     7  TP3     data1   C,A
$
$ # check the program file
$ cat -n process_test.pl
     1  #!/usr/bin/perl
     2  use strict;
     3  use warnings;
     4
     5  die "Specify filename\n" if not defined $ARGV[0];       # Ask for filename
     6  my %compare;                                            # Declare the hash to store all information
     7  my $file = $ARGV[0];                                    # Assign the filename to a variable
     8  open (FH, "<", $file) or die "Can't open $file: $!";    # Open the file handle; balk on error
     9  while (<FH>) {                                          # Loop through the file, line by line
    10    chomp;                                                # Remove the End-of-Line character
    11    my @tokens = split/\t+/;                              # Split line on Tab and assign to array "tokens"
    12    if (not defined $compare{$tokens[0]}) {               # If 1st element of "tokens" is not a key, then
    13      $compare{$tokens[0]} = [ "data1", "N,N",            # Create the key in the "compare" hash and
    14                               "data2", "N,N",            # assign a template value with default "N,N"s
    15                               "data3", "N,N"             # The [] returns a reference to the array, since
    16                             ];                           # the hash value must be a scalar in Perl.
    17    }
    18    (my $index = $tokens[1]) =~ s/\D+//;                  # Determine the array index to be updated
    19    $compare{$tokens[0]}->[2*$index-1] = $tokens[2];      # And then update the array
    20  }                                                       # Done reading the file
    21  close (FH) or die "Can't close $file: $!";              # So close it; balk on error
    22  while (my ($k, $v) = each %compare) {                   # Loop through the hash
    23    printf ("%s %s\n", $k, join (" ", @{$compare{$k}}));  # and print out the keys and values
    24  }
$
$ # A dry run
$ perl process_test.pl
Specify filename
$
$ # A successful run
$ perl process_test.pl test.txt
AC2 data1 A,B data2 A,C data3 N,N
TP3 data1 C,A data2 N,N data3 C,C
SN1 data1 A,A data2 A,B data3 A,C
$
$

tyler_durden