Random web page download wget script

Hi,

I've been attempting to create a script that downloads web pages at random intervals to mimic typical user usage. However I'm struggling to link $url to the URL list and thus wget complains of a missing URL. Any ideas?

Thanks

#!/bin/sh
#URL List
url1="http://www.bbc.co.uk"
url2="http://www.cnn.com"
url3="http://www.msn.com"

#Number of users to mimic simultaneously
users=20

for (( c=1; c<=$users; c++ ))
do 
(
wait=`echo $(( RANDOM% 120 + 30 ))`
rand=`echo $(( RANDOM% 3 + 1 ))`
url=url"$rand"
wget -p $url
sleep $wait
)&
done
#!/bin/sh
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

#Number of users to mimic simultaneously
users=20

for (( c = 1; c <= $users; c++ )); do
  wait=`expr $RANDOM % 120 + 30`
  n=`expr $RANDOM % 3`
  url=${url_list[$n]}
  wget -p $url
  sleep $wait
done

There's no need to use expr or backticks.

i.e.

wait=$(( RANDOM% 120 + 30 ))

would work just fine.

Works great however I included the ampersand to mimic x amount of users ($users) simultaneously. The working example waits for wget to finish before moving on to the next and ends after x number URLs ($users) have been downloaded.

So, the script should forever loop with each "user" downloading a URL from the list at random intervals.

Thanks again.

If the loop should forever loop, then I would think it would not use the for loop. Right? Because the for loop will go 20 times, and then the script is finished. Is that correct, or am I missing something? Should we just use an infinite loop like while [ 1 -eq 1 ] to make it go forever?

Not sure what the best approach is here. Please bare with me while I attempt to explain in a rudimentary fashion. What I need it to do is this:

user 1
download url
wait random
loop

&

user 2
download url
wait random
loop

&

user n
download url
wait random
loop

What i'm trying to avoid is running the following code in 20 different sessions i.e. setsid by using a single script to do the same job.

#!/bin/sh
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

for (( ; ; )); do
  wait=`expr $RANDOM % 120 + 30`
  n=`expr $RANDOM % 3`
  url=${url_list[$n]}
  wget -p $url
  sleep $wait
done

I hope that makes sense.

Thanks for explaining more. It's quite clear what you are aiming for. Even your original explanation was pretty clear. Try the following:

#!/bin/bash
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

#Number of users to mimic simultaneously
users=20

function one_user () {
  local user=$1
  while [ 1 -eq 1 ]; do
    local wait=`expr $RANDOM % 120 + 30`
    local n=`expr $RANDOM % 3`
    local url=${url_list[$n]}
    wget -p $url
    # echo user = $user wait = $wait url = $url
    sleep $wait
  done
  }

for (( user = 1; user <= $users; user++ )); do
  one_user $user &
done

What I don't like is that it starts 20 background processes. When you want to eventually kill them it might be a little problematic. But I don't see any way around having a bunch of background processes. And it seems like the original script was designed to have multiple background processes, so you are OK with that. :slight_smile:

Works great!! I amended it to include my very simple additions.

#!/bin/bash
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

#Number of users to mimic simultaneously
users=20

function one_user () {
  local user=$1
  while [ 1 -eq 1 ]; do
    local wait=`expr $RANDOM % 120 + 30`
    local n=`expr $RANDOM % 3`
    local url=${url_list[$n]}
    time=`date +"%T"`
    date=`date +"%m-%d-%y"`
    wget=`wget -E -H -T 30 -k -K -p --delete-after --no-cache -e robots=off $url 2>&1 | grep Downloaded | awk -F " " '{print $6}'`
    echo $date,$time,client$user,$url,$wget
    # echo user = $user wait = $wait url = $url
    sleep $wait
  done
  }

for (( user = 1; user <= $users; user++ )); do
  one_user $user &
done

Sample output.

04-21-13,00:34:01,client1,http://www.msn.com,1.8s
04-21-13,00:34:01,client14,http://www.bbc.co.uk,3.6s
04-21-13,00:34:34,client2,http://www.msn.com,1.5s
04-21-13,00:34:39,client19,http://www.msn.com,1.7s
04-21-13,00:34:34,client12,http://www.bbc.co.uk,3.6s
04-21-13,00:34:34,client4,http://www.cnn.com,4.9s
04-21-13,00:34:40,client20,http://www.bbc.co.uk,3.4s
04-21-13,00:34:49,client11,http://www.msn.com,1.9s
04-21-13,00:34:58,client14,http://www.bbc.co.uk,0.9s
04-21-13,00:34:50,client8,http://www.bbc.co.uk,3.8s
04-21-13,00:34:58,client5,http://www.bbc.co.uk,3.6s
04-21-13,00:35:19,client12,http://www.msn.com,1.4s
04-21-13,00:35:25,client10,http://www.msn.com,1.5s
04-21-13,00:35:20,client13,http://www.bbc.co.uk,3.3s
04-21-13,00:35:29,client3,http://www.bbc.co.uk,3.1s
04-21-13,00:35:35,client8,http://www.bbc.co.uk,3.1s
04-21-13,00:35:46,client9,http://www.msn.com,1.4s
04-21-13,00:35:55,client17,http://www.msn.com,2.1s
04-21-13,00:35:58,client7,http://www.msn.com,1.4s
04-21-13,00:35:50,client18,http://www.cnn.com,4.4s

Much appreciated hanson44!!

That's really a great testing system you have. It seems like it will be very interesting to observe how it behaves, as it mimics those downloading users.

Back to the script, the only (perhaps nit-picky) thing I would suggest would be to put the wget arguments into a separate variable to improve readability:

args="-E -H -T 30 -k -K -p --delete-after --no-cache -e robots=off"
wget=`wget $args $url 2>&1 | grep Downloaded | awk -F " " '{print $6}'`

Thanks Hanson, much of it was your work.

I'm using it to differentiate between two mobile systems although it could be used for stress testing.

Another useful addition would be to include some file downloads, say 10MB. However, I just tested this and the grep in the script wouldn't work if i included the file URLs. In the example below, i would need the info "5.7s". Any idea?

[root@scripts]# wget http://download.thinkbroadband.com/5MB.zip
--2013-04-21 01:32:47--  http://download.thinkbroadband.com/5MB.zip
Resolving download.thinkbroadband.com... 80.249.99.148, 2a02:68:1:7::1
Connecting to download.thinkbroadband.com|80.249.99.148|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5242880 (5.0M) [application/zip]
Saving to: `5MB.zip'

100%[==============================================================================================================================>] 5,242,880   1.34M/s   in 5.7s

2013-04-21 01:32:54 (900 KB/s) - `5MB.zip' saved [5242880/5242880]

---------- Post updated at 08:01 PM ---------- Previous update was at 07:38 PM ----------

Ah, figured it out. I used the last occurrence of "in" as the target as thats common across wget web page and file downloads.

#!/bin/bash
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

#Number of users to mimic simultaneously
users=20

#wget arguments
args="-E -H -T 30 -k -K -p --delete-after --no-cache -e robots=off"

function one_user () {
  local user=$1
  while [ 1 -eq 1 ]; do
    local wait=`expr $RANDOM % 120 + 30`
    local n=`expr $RANDOM % 3`
    local url=${url_list[$n]}
    time=`date +"%T"`
    date=`date +"%m-%d-%y"`
    wget=`wget $args $url 2>&1 | awk '/in/{a=$0}END{print a}' | awk -F "in" '{print$2}'`
    echo $date,$time,client$user,$url,$wget
    # echo user = $user wait = $wait url = $url
    sleep $wait
  done
  }

for (( user = 1; user <= $users; user++ )); do
  one_user $user &
done

As Scott write, why do you still use back tics `` and expr
You code looks better without.

#!/bin/bash
url_list=( http://www.bbc.co.uk http://www.cnn.com http://www.msn.com )

#Number of users to mimic simultaneously
users=20

#wget arguments
args="-E -H -T 30 -k -K -p --delete-after --no-cache -e robots=off"

function one_user () {
  local user=$1
  while [ 1 -eq 1 ]; do
    local wait=$((RANDOM % 120 + 30))
    local n=$((RANDOM % 3))
    local url=${url_list[$n]}
    time=$(date +"%T")
    date=$(date +"%m-%d-%y")
    wget=$(wget $args $url 2>&1 | awk '/in/{a=$0}END{print a}' | awk -F "in" '{print$2}')
    echo $date,$time,client$user,$url,$wget
    # echo user = $user wait = $wait url = $url
    sleep $wait
  done
  }

for (( user = 1; user <= $users; user++ )); do
  one_user $user &
done
why do you still use back tics  ``  and  expr 
You code looks better without.

I would suggest that either way is fine.

Either way do work but backticks are easy to miss since they are so small ` and nesting is not easy with them.
BashFAQ/082 - Greg's Wiki

Thanks for sending the reference. We're in basic agreement.

I agree $( ) is better for nested commands. But I think it's bad practice to nest commands. Seems much clearer to use an intermediate variable.

I perhaps agree with the example about backslashes. But this seems pretty rare. And neither format seems "obvious". Multiple backslashes are confusing to most people, including me.

For "nested quoting", I didn't realize there was a difference between the shells. I use bash, so would not matter. But I agree this case is clearer with the $( ) syntax for compatibility with sh and ksh.

You're right the backticks are tiny. But for some reason, they are easy to see for me and still leave the code clean. The article says "easily confused with a single quote". Sounds reasonable. But that has NEVER happened to me. Maybe I'm just used to those backticks, from repeated usage...

If the author of the article were really objective, they would say backticks take one less space. Yes, it's nit-picky. But so are the other points.

BTW, I have nothing against the $( ) syntax. It's perfectly fine. I just don't think it matters in the great majority of cases. I see these "crusades" (I am NOT referring to any posts here) against certain practices, and think they are a tempest in a teapot, because the practices seem useful and easy to understand to me. :slight_smile:

Another thing.
$() is POSIX compliant
`` is not

Example on nesting

echo `echo \`ls\``
echo $(echo $(ls))

And yes, you are free to select what you like as long as it works :slight_smile: