[Perl] Split lines into array - variable line items - variable no of lines.

ejdv · September 27, 2011, 9:42am

Hi,

I have the following lines that I would like to see in an array for easy comparisons and printing:

Example 1:

field1,field2,field3,field4,field5
value1,value2,value3,value4,value5

Example 2:

field1,field3,field4,field2,field5,field6,field7
value1,value3,value4,value2,value5,value6,field7
value1,value3,value4,value2,value5,value6,field7
value1,value3,value4,value2,value5,value6,field7

So, the number of lines, the number of fields and the field order can differ.

As output I would like to see:

field2,field3,field5
value2,value3,value5

field2,field3,field5
value2,value3,value5
value2,value3,value5
value2,value3,value5

Those to be printed fields and values are always present, independent from the number of fields and the field order.
And field1 and value1 are always on the first place.
The background is that has to run on different systems and those different systems will deliver different field numbers and field order.

I started with something like this, but got stuck somehow due to a lack of Perl knowledge.
Would it have been a more static input then I it would be a bit more easier.

my @LineItems;
my $LineItems;
my $FieldValue;
my $i;
my $NumItems;
my $LineCount;

open GETLINES, "cat /tmp/lines.txt |");
$LineCount = 0;
while ( $Line = <GETLINES> ) {
  if ( $Line =~ /^Field1,/ ) {
    $LineCount++;
    @LineItems = split (/,/, $Line);
    $NumItems = @LineItems;
    for ( $i = 1; $i < $NumItems; $i++ ) {
       $FieldValue{$i} = $LineItems[$i];
    }
  }
  if ( $Line =~ /^Value1,/ ) {
    $LineCount++;
    @LineItems = split (/,/, $Line);
    $NumItems = @LineItems;
    for ( $i = 1; $i < $NumItems; $i++ ) {
      $FieldValue{$i} = $LineItems[$i];
    }
  }
}

I would appreciate any kind of assistance.

ejdv

Skrynesaver · September 27, 2011, 9:50am

Regex is case sensitive and your field headers are lower case try the i
modifier if you are unsure.

I would be inclined to read the next line within the first block and then assign it to a hash keyed on fieldname

ejdv · September 27, 2011, 10:16am

@Skrynesaver,

Thanks for the quick reply.
Point taken about the i modifier.
In the example of course it had to be field1 iso Field1.

With "a hash keyed on fieldname" you mean something like this ?

$FieldValue{ 'field2' } = 'value2'

Reading the first line builds the hash and then the next lines fill the hash.
For example 1 I can imagine where this is going, but not for example 2.
Not into keyed hashes and hashed keyed hashes yet

Got some more hints ?

durden_tyler · September 27, 2011, 11:37am

Since the data in your sample file is repetitive, I've used a different sample data file for this problem.

Let's say the data file looks like this:

$
$
$ cat lines.txt
Microsoft,IBM,Oracle,Apple
Windows 95,DB2,Oracle,MacBook
MS Excel,Fortran,Siebel,iPod
XBox,ATM,MySQL,iPad
Zune,Deep Blue,PeopleSoft,Pixar
$
$

Now, the first line has the keys, in this case - it is the company name.
The second line onwards, we have the values in columnar fashion.
A single key (Company) may have multiple values (Products).

For example, the first column has the key "Microsoft" and the values as the list ("Windows 95", "MS Excel", "XBox", "Zune"). The case for column 2 is similar, and so on.

We could create a nested data structure to store all this information.
At the top level, we have an array, say, @all_comp_products. Each element of this array is an array reference. This array reference has the Company Name as the first element, and the second element is yet another array reference to the list of products of that company.

Thus, the first element of @all_comp_products looks like this:

$all_comp_products[0] = [ "Microsoft", [ "Windows 95", "MS Excel", "XBox", "Zune" ] ];

The second element looks like this:

$all_comp_products[1] = [ "IBM", [ "DB2", "Fortran", "ATM", "Deep Blue" ] ];

and so on.

The Perl program looks like this:

$
$
$ cat -n lines.pl
     1  #perl -w
     2  # ##################################################################################################
     3  #
     4  #  For the data file that looks like this:
     5  #
     6  #  Microsoft,IBM,Oracle,Apple
     7  #  Windows 95,DB2,Oracle,MacBook
     8  #  MS Excel,Fortran,Siebel,iPod
     9  #  XBox,ATM,MySQL,iPad
    10  #  Zune,Deep Blue,PeopleSoft,Pixar
    11  #
    12  #  this Perl program creates a nested data structure @all_comp_products that looks like this:
    13  #
    14  #  $all_comp_products[0] = [ "Microsoft", [ "Windows 95", "MS Excel", "XBox",  "Zune"       ] ];
    15  #  $all_comp_products[1] = [ "IBM",       [ "DB2",        "Fortran",  "ATM",   "Deep Blue"  ] ];
    16  #  $all_comp_products[2] = [ "Oracle",    [ "Oracle",     "Siebel",   "MySQL", "PeopleSoft" ] ];
    17  #  $all_comp_products[3] = [ "Apple",     [ "MacBook",    "iPod",     "iPad",  "Pixar"      ] ];
    18  #
    19  # ##################################################################################################
    20
    21  my $file = "lines.txt";
    22  my $company;
    23  my $product;
    24  my @all_comp_products;
    25  my $idx = 0;
    26
    27  open (FH, "<", $file) or die "Can't open $file for reading: $!";
    28  while (<FH>) {
    29    chomp;
    30    if (/^Microsoft/) {
    31      foreach $company (split /,/) {
    32         push @all_comp_products, [ $company ];
    33      }
    34    } else {
    35      foreach $product (split /,/) {
    36         push @{${$all_comp_products[$idx]}[1]}, $product;
    37         $idx++;
    38      }
    39      $idx = 0;
    40    }
    41  }
    42  close (FH) or die "Can't close $file: $!";
    43
    44  # Now, we'll iterate through the nested data structure and display the data
    45  foreach my $item (@all_comp_products) {
    46    $company = $$item[0];
    47    print "Company  : $company\n";
    48    print "Products :\n";
    49    foreach my $prod (@{$$item[1]}) {
    50      print "           $prod\n";
    51    }
    52    print "=" x 40,"\n";
    53  }
    54
$
$

And here's a test run:

$
$ perl lines.pl
Company  : Microsoft
Products :
           Windows 95
           MS Excel
           XBox
           Zune
========================================
Company  : IBM
Products :
           DB2
           Fortran
           ATM
           Deep Blue
========================================
Company  : Oracle
Products :
           Oracle
           Siebel
           MySQL
           PeopleSoft
========================================
Company  : Apple
Products :
           MacBook
           iPod
           iPad
           Pixar
========================================
$
$

tyler_durden

ejdv · September 28, 2011, 5:07am

@tyler_durden,

Thanks a lot for this great example.
It should enable me to solve my 'problem'.

---------- Post updated at 11:07 AM ---------- Previous update was at 08:52 AM ----------

@tyler_durden,

I have an additional question.

What if I only want to print Apple and IBM and in that exact order ?
Or for example Oracle, Apple and IBM in this exact order.

I tried to add this:

That results in this:

But I need it to be in the order as specified by %OrderedList.

Could you please assist me with this final step ?

Skrynesaver · September 28, 2011, 5:47am

A hash has no specified order, it is not an array, however if you were to use ordinal values when defining your list you could do the following (remember that any number > 0 is true).

my %OrderedList = (
    'Apple' => "1",
    'IBM' => "2",
    'Uwanted'=>"0",
);
for $company (sort {$OrderedList{$a}<=> $OrderedList{$b}}keys %OrderedList){
    if ($OrderedList{$company}){
        print "$company is in position $OrderedList{$company}\n";
    }
}

Hope that helps
[/FONT]

ejdv · September 28, 2011, 6:25am

Thanks, but I fail to see how to use that in printing the desired output.
This is the @all_comp_products:

  DB<2> x @all_comp_products
0  ARRAY(0x22904)
   0  'Microsoft'
   1  ARRAY(0x2912cc)
      0  'Windows 95'
      1  'MS Excel'
      2  'XBox'
      3  'Zune'
1  ARRAY(0x204a94)
   0  'IBM'
   1  ARRAY(0x27f494)
      0  'DB2'
      1  'Fortran'
      2  'ATM'
      3  'Deep Blue'
2  ARRAY(0x204e30)
   0  'Oracle'
   1  ARRAY(0x27f4b8)
      0  'Oracle'
      1  'Siebel'
      2  'MySQL'
      3  'PeopleSoft'
3  ARRAY(0x291248)
   0  'Apple'
   1  ARRAY(0x299bcc)
      0  'MacBook'
      1  'iPod'
      2  'iPad'
      3  'Pixar'

Do not know how to access the array items when I have the desired $company.

durden_tyler · September 28, 2011, 11:07am

You'll need to know the concept of "references" for this.

As for the custom ordering of keys, you could reorder the array @all_comp_products, but that was not its intention in the first place. The array was used so that the companies (and hence their products) could be displayed exactly in the order in which they appear in the data file.

A hash should be a simpler alternative. The algorithm is as follows:

(1) We create a hash %all_comp_products that has the company names as keys.
(2) The values are the array references that point to arrays of products.
(3) A hash called %keyorder has our custom key order for keys of %all_comp_products
(4) We iterate through keys of %all_comp_products sorted as per %keyorder and print company and product values.

Something like this -

$
$
$ cat lines.txt
Microsoft,IBM,Oracle,Apple
Windows 95,DB2,Oracle,MacBook
MS Excel,Fortran,Siebel,iPod
XBox,ATM,MySQL,iPad
Zune,Deep Blue,PeopleSoft,Pixar
$
$
$ cat -n lines2.pl
     1  #perl -w
     2  use strict;
     3  # ##################################################################################################
     4  #
     5  #  For the data file that looks like this:
     6  #
     7  #  Microsoft,IBM,Oracle,Apple
     8  #  Windows 95,DB2,Oracle,MacBook
     9  #  MS Excel,Fortran,Siebel,iPod
    10  #  XBox,ATM,MySQL,iPad
    11  #  Zune,Deep Blue,PeopleSoft,Pixar
    12  #
    13  #  this Perl program creates a hash %all_comp_products that looks like this:
    14  #
    15  #  $all_comp_products{Microsoft} = [ "Windows 95", "MS Excel", "XBox",  "Zune"       ]
    16  #  $all_comp_products{IBM}       = [ "DB2",        "Fortran",  "ATM",   "Deep Blue"  ]
    17  #  $all_comp_products{Oracle}    = [ "Oracle",     "Siebel",   "MySQL", "PeopleSoft" ]
    18  #  $all_comp_products{Apple}     = [ "MacBook",    "iPod",     "iPad",  "Pixar"      ]
    19  #
    20  # ##################################################################################################
    21
    22  my $file = "lines.txt";
    23  my @company;
    24  my @values;
    25  my %all_comp_products;
    26  my %keyorder;
    27
    28  open (FH, "<", $file) or die "Can't open $file for reading: $!";
    29  while (<FH>) {
    30    chomp;
    31    if (/^Microsoft/) {
    32      @company = split /,/;
    33    } else {
    34      @values = split /,/;
    35      for (my $i = 0; $i <= $#company; $i++) {
    36        push @{$all_comp_products{$company[$i]}}, $values[$i];
    37      }
    38    }
    39  }
    40  close (FH) or die "Can't close $file: $!";
    41
    42  # Our custom key order is as follows -
    43  %keyorder = qw (Oracle 1 Apple 2 IBM 3 Microsoft 4);
    44
    45  # Now, we'll iterate through the hash with custom sorted key order and display the data
    46  foreach my $comp (sort {$keyorder{$a} <=> $keyorder{$b}} keys %all_comp_products ) {
    47    print "Company  : $comp\n";
    48    print "Products :\n";
    49    foreach my $prod (@{$all_comp_products{$comp}}) {
    50      print "           $prod\n";
    51    }
    52    print "=" x 40,"\n";
    53  }
$
$
$ perl lines2.pl
Company  : Oracle
Products :
           Oracle
           Siebel
           MySQL
           PeopleSoft
========================================
Company  : Apple
Products :
           MacBook
           iPod
           iPad
           Pixar
========================================
Company  : IBM
Products :
           DB2
           Fortran
           ATM
           Deep Blue
========================================
Company  : Microsoft
Products :
           Windows 95
           MS Excel
           XBox
           Zune
========================================
$
$

tyler_durden