Need help with extracting data to MySQL format

Hi guys,

I'm doing a project now and extracting tables from a webpage to MySQL table format.

I dumped the webpage with lynx and it is like this
   
   id
   Spec
   524543
   Developed especially for seniors
   Spec
   No
   Java
   Spec
   Yes
   Java MIDP
   Spec
   Available using application
   Operating system
   Spec
   Android 2.3 (Gingerbread)
   Qwerty keyboard (buttons)
   Spec
   No
   Qwerty keyboard (screen)
   Spec
   Yes
   SAR
   Spec
    0.34 W/kg
   Stand by time
   Spec
    710 h
   Talk time
   Spec
    18.2 h
   Talk time (3G)
   Spec
    8.4 h
   Stand By-Time (3G)
   Spec
    610 h
   Audio features
   Built-in speakers
   Spec
   Mono
   Camera
   Built-in camera
   Spec
   Yes
   Auto focus
   Spec
   Yes
   Built-in flash
   Spec
   Yes
   Flash type
   Spec
   LED-flash   

it is specification of a cellphone. And I got the line number of each 'Spec' word with grep cut and sed and it is like this:

2 
5 
8 
11
14
17
20
23
26
29
32
35
39
43
46
49
52

now I want to get just the lines after each 'Spec' word and put in the MySQL format like this:

INSERT INTO `specifications` VALUES (524543,'No','Yes','Available using application','Android 2.3 (Gingerbread)',...);

I know that I need to get the lines with number +1 and put them between ',' in one line, but this part was very hard for me could you please help me about that?

Cheers,
Johanni

The following would create your statement

You could use the DBI and DBD::mysql modules to add the values directly to the database.

WARNING CODE UNTESTED!!

#!/usr/bin/perl

use strict;
use warnings;

open (my $data, "<", $ARGV[0]); #Supply the spec file as the argument
my $after_spec=0;
my @values;
while(<$data>){
   chomp;
   if ($after_spec){
      push @values $_;
      $after_spec--;
   }
   elsif (/^Spec$/){
      $after_spec++;
   }
}
for (@values){ #add quotes to non numerical values and escape single quotes
   s/'/\\'/g;
   if (! /\d+(\.\d+)/){
       $_="'".$_."'";
   }
}
$values_string=join(', ',@values);
print "INSERT INTO `specifications` VALUES ($values);"
1 Like

Thanks Skrynesaver,

this could is a bit pro for me :slight_smile:

I got these errors

$ data2.sh
/usr/bin/data2.sh: line 2: use: command not found
/usr/bin/data2.sh: line 2: $'\r': command not found
/usr/bin/data2.sh: line 3: use: command not found
/usr/bin/data2.sh: line 3: $'\r': command not found
/usr/bin/data2.sh: line 4: syntax error near unexpected token `my'
/usr/bin/data2.sh: line 4: `open (my $data, "<", $temp1.dat[0]); #Supply the spec
' file as the argument

I installed perl also.

I don't know what is the problem?

I input the file like this:

open (my $data, "<", $temp1.dat[0]);

is it correct?

use is a Perl command , try renaming the file data2.pl and then run chmod +x ./data2.pl to make the script runnable.

Probably not, the original script is intended to be runnable as follows...

./datas2.pl PATH_TO_ORIGINAL_DATA_DUMP_FROM_LYNX

$ARGV[0] is the first argument provided to the script on the command line.

And here's the code after fixing up the bugs.

#!/usr/bin/perl

#These two pragmas catch typos for you.
use strict; 
use warnings;

open (my $data, "<", $ARGV[0]); #Supply the spec file as the argument
my $after_spec=0; #flag which is set if the line is a spec line.
my @values; # an array to hold all the lines that match
while(<$data>){ # loop through the file one line at a time
   chomp; #remove trailing newlines.
   if ($after_spec){ # if we set the flag on the previous line.
      push @values, $_; # add this string to the values array
      $after_spec--; # unset the flag.
   }
   elsif (/^Spec$/){ # if the line matches the pattern 
      $after_spec++; # set the flag
   }
}
for (@values){ #go through the values array cleaning up the data.
   s/^\s*(.+)\s*$/$1/;#remove leading and trailing spaces
   s/'/\\'/g; # escape single quotes
   if (! /^\d+(\.\d+)?$/){ # if this is not a number
       $_="'".$_."'"; # surround the value with quotes
   }
}
my $values_string=join(', ',@values); # create a value string 
print "INSERT INTO `specifications` VALUES ($values_string);\n" # print the result.

Output with snippet of your data above

~/tmp$ ./tmp.pl tmp.dat 
INSERT INTO `specifications` VALUES (524543, 'No', 'Yes', 'Available using application', 'Android 2.3 (Gingerbread)', 'No', 'Yes', '0.34 W/kg', '710 h', '18.2 h', '8.4 h', '610 h', 'Mono', 'Yes', 'Yes', 'Yes', 'LED-flash');
1 Like

tnx man for the reply, this seems very complete code, but it seems that my perl is not going to work,

I'm recieving this error now:

$ data2.pl
bash: /usr/bin/data2.pl: /bin/per1^M: bad interpreter: Permission denied

I,ve googled and they say to put exec in the fstab file in etc folder (I'm using cygwin)
but it didn't worked and I'm recieiving the sam error. Do you have any idea about that?

---------- Post updated at 10:07 PM ---------- Previous update was at 08:59 PM ----------

okay, I fixed the perl :slight_smile:

now I'm receiving this:

INSERT INTO `specifications` VALUES ();

with no value

I've checked the code, it is reading the file, but I don't know what is the problem?

I input the file like this:

open (my $data, "<", "temp1.dat");

is it correct? I didn't understand the

$ARGV[0]

Could you please tell me how should I input the file?

tnx:)

---------- Post updated at 10:30 PM ---------- Previous update was at 10:07 PM ----------

worked :))

there were some blank spaces before Spec :wink: