I'm doing a project now and extracting tables from a webpage to MySQL table format.
I dumped the webpage with lynx and it is like this
id
Spec
524543
Developed especially for seniors
Spec
No
Java
Spec
Yes
Java MIDP
Spec
Available using application
Operating system
Spec
Android 2.3 (Gingerbread)
Qwerty keyboard (buttons)
Spec
No
Qwerty keyboard (screen)
Spec
Yes
SAR
Spec
0.34 W/kg
Stand by time
Spec
710 h
Talk time
Spec
18.2 h
Talk time (3G)
Spec
8.4 h
Stand By-Time (3G)
Spec
610 h
Audio features
Built-in speakers
Spec
Mono
Camera
Built-in camera
Spec
Yes
Auto focus
Spec
Yes
Built-in flash
Spec
Yes
Flash type
Spec
LED-flash
it is specification of a cellphone. And I got the line number of each 'Spec' word with grep cut and sed and it is like this:
2
5
8
11
14
17
20
23
26
29
32
35
39
43
46
49
52
now I want to get just the lines after each 'Spec' word and put in the MySQL format like this:
INSERT INTO `specifications` VALUES (524543,'No','Yes','Available using application','Android 2.3 (Gingerbread)',...);
I know that I need to get the lines with number +1 and put them between ',' in one line, but this part was very hard for me could you please help me about that?
You could use the DBI and DBD::mysql modules to add the values directly to the database.
WARNING CODE UNTESTED!!
#!/usr/bin/perl
use strict;
use warnings;
open (my $data, "<", $ARGV[0]); #Supply the spec file as the argument
my $after_spec=0;
my @values;
while(<$data>){
chomp;
if ($after_spec){
push @values $_;
$after_spec--;
}
elsif (/^Spec$/){
$after_spec++;
}
}
for (@values){ #add quotes to non numerical values and escape single quotes
s/'/\\'/g;
if (! /\d+(\.\d+)/){
$_="'".$_."'";
}
}
$values_string=join(', ',@values);
print "INSERT INTO `specifications` VALUES ($values);"
$ data2.sh
/usr/bin/data2.sh: line 2: use: command not found
/usr/bin/data2.sh: line 2: $'\r': command not found
/usr/bin/data2.sh: line 3: use: command not found
/usr/bin/data2.sh: line 3: $'\r': command not found
/usr/bin/data2.sh: line 4: syntax error near unexpected token `my'
/usr/bin/data2.sh: line 4: `open (my $data, "<", $temp1.dat[0]); #Supply the spec
' file as the argument
use is a Perl command , try renaming the file data2.pl and then run chmod +x ./data2.pl to make the script runnable.
Probably not, the original script is intended to be runnable as follows...
./datas2.pl PATH_TO_ORIGINAL_DATA_DUMP_FROM_LYNX
$ARGV[0] is the first argument provided to the script on the command line.
And here's the code after fixing up the bugs.
#!/usr/bin/perl
#These two pragmas catch typos for you.
use strict;
use warnings;
open (my $data, "<", $ARGV[0]); #Supply the spec file as the argument
my $after_spec=0; #flag which is set if the line is a spec line.
my @values; # an array to hold all the lines that match
while(<$data>){ # loop through the file one line at a time
chomp; #remove trailing newlines.
if ($after_spec){ # if we set the flag on the previous line.
push @values, $_; # add this string to the values array
$after_spec--; # unset the flag.
}
elsif (/^Spec$/){ # if the line matches the pattern
$after_spec++; # set the flag
}
}
for (@values){ #go through the values array cleaning up the data.
s/^\s*(.+)\s*$/$1/;#remove leading and trailing spaces
s/'/\\'/g; # escape single quotes
if (! /^\d+(\.\d+)?$/){ # if this is not a number
$_="'".$_."'"; # surround the value with quotes
}
}
my $values_string=join(', ',@values); # create a value string
print "INSERT INTO `specifications` VALUES ($values_string);\n" # print the result.
tnx man for the reply, this seems very complete code, but it seems that my perl is not going to work,
I'm recieving this error now:
$ data2.pl
bash: /usr/bin/data2.pl: /bin/per1^M: bad interpreter: Permission denied
I,ve googled and they say to put exec in the fstab file in etc folder (I'm using cygwin)
but it didn't worked and I'm recieiving the sam error. Do you have any idea about that?
---------- Post updated at 10:07 PM ---------- Previous update was at 08:59 PM ----------
okay, I fixed the perl
now I'm receiving this:
INSERT INTO `specifications` VALUES ();
with no value
I've checked the code, it is reading the file, but I don't know what is the problem?
I input the file like this:
open (my $data, "<", "temp1.dat");
is it correct? I didn't understand the
$ARGV[0]
Could you please tell me how should I input the file?
tnx:)
---------- Post updated at 10:30 PM ---------- Previous update was at 10:07 PM ----------