I wrote a shell script for checking the status code of the pages. for 1300 URLs, it is taking 15 mins to run. Can anyone pls provide a solution to make the script to run faster. about 5 mins..
In my script i used "while loop" and "wget --server-response".
Without knowing the script it would be hard to tell. And 1300 URLs in 15 minutes already is pretty fast for me ( 15 minutes * 60 / 1300 = ~0.7 seconds per URL, including starting the process and establishing the connection)
Just the script have to generate the status codes for 1300 URLs, Am going to Integrate this script with a Build process. This Build process will take 40 mins to run. After this build process gets over, this status code checker will be triggered, so if this script runs for 5 mins then it will useful.
So if the script runs in such a way that 10 instances taking 10 URLs at a time (i.e., in Parallel) then the script will run in 5 mins.. Any ideas of how to do this in parallel ?
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my @sites=qw(www.microsoft.com www.google.com www.yahoo.com);
foreach my $site (@sites){
my $filename=$site;
$site='http://'.$site;
my $response = getstore($site,$filename);
if($response eq RC_OK){
print "$site is up.\n";
}
else{
print "Theres some error on the site. Returned error code $response.\n";
}
}
There's a few ways to parallelize this, but for that we'd need more info. What system are you on? Is using GNU parallel an option? How do you get the input? What do you do with the status code once you've got it? What shell are you using?
My response wasn't a suggestion on how to make it faster, but a question on how you're doing it now, so that we can get an idea on what might work, and what side effects should be considered.
This will run four simultaneous instances of wget. The '--max-args' stops it from feeding too many args into one wget, so in case one download hangs a while, the other instances will be able to take up most of the slack.
The --spider tells it not to download the page, just check its existence, which should also help improve script speeds.
The -nv tells it to print success or failures one per line.
i used GNU parallel in my script. If i run the script in command prompt by giving the command "sh code.sh" then it executes as expected. But if i run it using crontab, then error message thrown as "parallel: command not found". i.e., it is not recognizing the parallel command. but i installed GNU parallel package in the same path where my script is present.
That question is so common it's in our FAQ. cron has a very minimal PATH compared to a user shell. You can either set your own PATH, or . /etc/profile to get a proper default PATH, or call parallel with its full path i.e. /path/to/parallel