How to store info from a txt file into a hash?

I'm trying to make a perl script using the "open" command to open and read a file, storing the information in said file into a hash structure.

This is what is inside my file-

Celena Standard  F 01/24/94 Cancer 
Jeniffer Orlowski  F 06/24/86 None
Brent Koehler  M 12/05/97  HIV
Mao Schleich  M 04/17/60  Cancer
Goldie Moultrie  F 04/05/96  None
Silva Rizzo  F 10/26/78  Amyloidosis
Leatha Papenfuss  F 10/15/97  CREST
Vita Sabb  F 05/28/87  Autism
Alyce Ugarte  F 12/21/64  HIV
Ela Prout  F 12/05/57  Autism
Mohamed Buchannon  M 07/24/91  Caner
Lael Stall  M 12/05/97  None

The first column is a name, the second is gender, third is birthdate, fourth is disease. The name is supposed to be the key while the other three columns are the values.

Also how would I allow the user to change information and output information to another file?

Since the "columns" in your file seem to be separated by one or more spaces, how do you know where the name "column" ends and the gender "column" starts? If more than one disease is associated with a name, does that add more <space>s to the last "column" in your file? If a disease has more than one word (e.g., diabetes mellitus or mitral valve prolapse), how are diseases separated from each other in the last "column"?

What have you tried to solve this problem on your own?

1 Like

Hello Eric1,

Let me give you a few examples.
If I were to implement at face value what you are asking this would be the result:

$ cat read_list.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @pair = /^(\w+\s\w+)\s+(.+)$/;
    $patient{$pair[0]} = $pair[1];
}
print Dumper \%patient;

Output:

$VAR1 = {
          'Leatha Papenfuss' => 'F 10/15/97  CREST',
          'Celena Standard' => 'F 01/24/94 Cancer ',
          'Vita Sabb' => 'F 05/28/87  Autism',
          'Jeniffer Orlowski' => 'F 06/24/86 None',
          'Alyce Ugarte' => 'F 12/21/64  HIV',
          'Silva Rizzo' => 'F 10/26/78  Amyloidosis',
          'Lael Stall' => 'M 12/05/97  None',
          'Mao Schleich' => 'M 04/17/60  Cancer',
          'Brent Koehler' => 'M 12/05/97  HIV',
          'Mohamed Buchannon' => 'M 07/24/91  Caner',
          'Ela Prout' => 'F 12/05/57  Autism',
          'Goldie Moultrie' => 'F 04/05/96  None'
        };

But I suspect that's not what you want. Probably, you would like something more like:

$ cat read_names.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @record = split;
    $patient{"@record[0..1]"} = {
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
print Dumper \%patient;

Output:

$ perl read_names.pl people.list
$VAR1 = {
          'Leatha Papenfuss' => {
                                  'disease' => 'CREST',
                                  'birthday' => '10/15/97',
                                  'gender' => 'F'
                                },
          'Celena Standard' => {
                                 'disease' => 'Cancer',
                                 'birthday' => '01/24/94',
                                 'gender' => 'F'
                               },
          'Vita Sabb' => {
                           'disease' => 'Autism',
                           'birthday' => '05/28/87',
                           'gender' => 'F'
                         },
          'Jeniffer Orlowski' => {
                                   'disease' => 'None',
                                   'birthday' => '06/24/86',
                                   'gender' => 'F'
                                 },
          'Alyce Ugarte' => {
                              'disease' => 'HIV',
                              'birthday' => '12/21/64',
                              'gender' => 'F'
                            },
          'Silva Rizzo' => {
                             'disease' => 'Amyloidosis',
                             'birthday' => '10/26/78',
                             'gender' => 'F'
                           },
          'Lael Stall' => {
                            'disease' => 'None',
                            'birthday' => '12/05/97',
                            'gender' => 'M'
                          },
          'Mao Schleich' => {
                              'disease' => 'Cancer',
                              'birthday' => '04/17/60',
                              'gender' => 'M'
                            },
          'Brent Koehler' => {
                               'disease' => 'HIV',
                               'birthday' => '12/05/97',
                               'gender' => 'M'
                             },
          'Mohamed Buchannon' => {
                                   'disease' => 'Caner',
                                   'birthday' => '07/24/91',
                                   'gender' => 'M'
                                 },
          'Ela Prout' => {
                           'disease' => 'Autism',
                           'birthday' => '12/05/57',
                           'gender' => 'F'
                         },
          'Goldie Moultrie' => {
                                 'disease' => 'None',
                                 'birthday' => '04/05/96',
                                 'gender' => 'F'
                               }
        };

However, depending of the real input, that might have a serious flaw. Name plus last name is not unique enough. There is the strong possibility that two or more entries might contain the same name last-name record even when the data would mean different people. Translation: you loose data, since a hash will keep only the last read.

Adding the birthday to the id might help to prevent that. Here's a modification of the previous code, using a modified input to prove handling of name collision and multi-word decease:

INPUT:

$ cat name.list
Celena Standard  F 01/24/94 Cancer
Jeniffer Orlowski  F 06/24/86 None
Brent Koehler  M 12/05/97  HIV
Mao Schleich  M 04/17/60  Cancer
Goldie Moultrie  F 04/05/96  None
Silva Rizzo  F 10/26/78  Amyloidosis
Leatha Papenfuss  F 10/15/97  CREST
Vita Sabb  F 05/28/87  Autism
Alyce Ugarte  F 12/21/64  HIV
Ela Prout  F 12/05/57  Autism
Silva Rizzo  F 22/5/81  Dissociative Indentity Disorder
Mohamed Buchannon  M 07/24/91  Caner
Lael Stall  M 12/05/97  None
$ cat read_names.pl
#!/usr/bin/perl
#
use strict;
use warnings;
use Data::Dumper;

my %patient;
while(<>) {
    my @record = split;
    $patient{"@record[0..1,3]"} = {
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
print Dumper \%patient;

Output:

$ perl read_names.pl name.list
$VAR1 = {
          'Ela Prout 12/05/57' => {
                                    'disease' => 'Autism',
                                    'birthday' => '12/05/57',
                                    'gender' => 'F'
                                  },
          'Silva Rizzo 10/26/78' => {
                                      'disease' => 'Amyloidosis',
                                      'birthday' => '10/26/78',
                                      'gender' => 'F'
                                    },
          'Mohamed Buchannon 07/24/91' => {
                                            'disease' => 'Caner',
                                            'birthday' => '07/24/91',
                                            'gender' => 'M'
                                          },
          'Vita Sabb 05/28/87' => {
                                    'disease' => 'Autism',
                                    'birthday' => '05/28/87',
                                    'gender' => 'F'
                                  },
          'Mao Schleich 04/17/60' => {
                                       'disease' => 'Cancer',
                                       'birthday' => '04/17/60',
                                       'gender' => 'M'
                                     },
          'Brent Koehler 12/05/97' => {
                                        'disease' => 'HIV',
                                        'birthday' => '12/05/97',
                                        'gender' => 'M'
                                      },
          'Jeniffer Orlowski 06/24/86' => {
                                            'disease' => 'None',
                                            'birthday' => '06/24/86',
                                            'gender' => 'F'
                                          },
          'Lael Stall 12/05/97' => {
                                     'disease' => 'None',
                                     'birthday' => '12/05/97',
                                     'gender' => 'M'
                                   },
          'Leatha Papenfuss 10/15/97' => {
                                           'disease' => 'CREST',
                                           'birthday' => '10/15/97',
                                           'gender' => 'F'
                                         },
          'Silva Rizzo 22/5/81' => {
                                     'disease' => 'Dissociative Indentity Disorder',
                                     'birthday' => '22/5/81',
                                     'gender' => 'F'
                                   },
          'Alyce Ugarte 12/21/64' => {
                                       'disease' => 'HIV',
                                       'birthday' => '12/21/64',
                                       'gender' => 'F'
                                     },
          'Goldie Moultrie 04/05/96' => {
                                          'disease' => 'None',
                                          'birthday' => '04/05/96',
                                          'gender' => 'F'
                                        },
          'Celena Standard 01/24/94' => {
                                          'disease' => 'Cancer',
                                          'birthday' => '01/24/94',
                                          'gender' => 'F'
                                        }
        };

Note:
The code assumes that the patient will always be name and last-name and not a variation like name alone or name, middle name, last-name, etc...

Once you decide and practice with extracting the data based on actual data, you could show your effort on it and follow up with your second question.

By the way, I hope the example does not contain real people's names and birthdays that you happen to be trusted with. That would be a 'terrible' thing to post.

Don't worry, the info I posted aren't real people. I'll try out your code in a bit Aia, thank you for the examples. Is there no way for me to accomplish this task of mine with the open commend though? As in something like-

open($patient_names, "+<Patient_Names.txt");

Here's an example how you might be able to open files to read and to write.
Open the patient file, search for cancer records and write the result to another file.

$ cat read_and_write_names.pl
#!/usr/bin/perl
#
use strict;
use warnings;

my $patient_names = 'patient.list';

#
# open patient.lit to read or exit
#
open my $in_file, '<', $patient_names or die "Could not open file $patient_names: $!\n";

#
# structure the database
#
my %patients;
while(<$in_file>) {
    my @record = split;
    $patients{"@record[0..1,3]"} = {
        'name' => "$record[0]",
        'lastname' => "$record[1]",
        'gender' => "$record[2]",
        'birthday' => "$record[3]",
        'disease' => "@record[4..$#record]",
    }

}
close $in_file;

#
# to reassemble the original order of fields
#
my @fields = qw(name lastname gender birthday disease);

#
# open new processed list of names or exit
#
my $new_patient_names = 'processed_names.list';
open my $out_file, '>', $new_patient_names or die "Could not open file $new_patient_names: $!\n";

#
# save only patients with cancer. Since the word Cancer can be found misspelled as Caner
# here's the opportunity how to handle misspells as well.
#
for my $record (keys %patients) {
    if ($patients{$record}{'disease'} =~ /^Canc?er/i) {
      print $out_file join (" ", @{$patients{$record}}{@fields}) . "\n";
    }
}
close $out_file;

Output:

$ cat processed_names.list
Mohamed Buchannon M 07/24/91 Caner
Mao Schleich M 04/17/60 Cancer
Celena Standard F 01/24/94 Cancer