Use the get and post method in the bash ( multi process)?

mnnn · November 22, 2018, 7:51pm

hi
I want to call a lot of links with the post method

What to do to speed it up??

####This method is slow


#!/bin/bash


func2() {
   index1=0
   while read line ; do
     index1=$(($index1+1))
     url[$index1]=$line
   done < tmp/url1.txt
}

func3() {
 for((j=1;j<=$countline;j++)); do
     i=`curl -LI ${url[$j]} -o /dev/null -w '%{http_code}\n' -s  ` ;
        echo "$i --- ${url[$j]}"
  done
 }
    
funcrun(){
   for ((n=1;n<=5;n++)) ; do
        func3 $countline ${url[@]}  >> 2.txt &
    done
}
   func2 
   funcrun

bakunin · November 23, 2018, 5:46am

Well, first off, the script you showed us can't do anything, neither slow nor fast: you create an array "$url" inside func2() but it will be private to this function and not be known any more once you leave this function.

Second, this method of filling an array is overly complicated, regardless of being inside a function or not:

func2() {
   index1=0
   while read line ; do
     index1=$(($index1+1))
     url[$index1]=$line
   done < tmp/url1.txt
}

You can easily do it this way:

while read line ; do
     url[$(( index++ )) ]="$line"
done < tmp/url1.txt

Next, your way of going through the array can also be improved, not to mention your way of subprocess handling - BACKTICKS ARE DEPRECATED:

func3() {
i=0
while [ $i -lt ${#url[@]} ] ; do
     echo "${url[$i]} --- $(curl -LI ${url[$i]} -o /dev/null -w '%{http_code}\n' -s )"
done
}

Finally: in func3() you run through every element of the array. In funcrun() you do the same (that you provide func3() with an argument which it ignores doesn't change anything)! So in fact your run not n curl -invocations for n array elements but n-squared! What "helps" a little is that you only do it for the first 5 elements of $url[] , regardless of how many elements it holds (see the for-loop in funcrun() ), so you only do 5 times of what is necessary instead of n times so.

You might want first consolidate this mess. If it still is "too slow" you might think about this: you could put the curl -invocations in background so that they run in parallel instead of one after the other. Notice, though, that if that is successful depends on the number of URLs you want to pull: a dozen is perhaps no problem, a few hundreds might be, a few thousands are definitely going to be a problem. You will need a "fanout" value in this case so that only a certain maximum number of parallel processes run at the same time.

I hope this helps.

bakunin

RudiC · November 23, 2018, 7:25am

On top of what bakunin already said, I'd like to add a few comments:

Did you consider the readarray / mapfile bash builtin for populating the array?
Why arrays at all? Read the URL- file immediately for the curl command.
You reference $countline twice but nowhere assign to it?

mnnn · November 23, 2018, 9:13am

bakunin:

Well, first off, the script you showed us can't do anything, neither slow nor fast: you create an array "$url" inside func2() but it will be private to this function and not be known any more once you leave this function.

Second, this method of filling an array is overly complicated, regardless of being inside a function or not:
func2() {
   index1=0
   while read line ; do
   index1=$(($index1+1))
   url[$index1]=$line
   done < tmp/url1.txt
}
You can easily do it this way:
while read line ; do
   url[$(( index++ )) ]="$line"
done < tmp/url1.txt
Next, your way of going through the array can also be improved, not to mention your way of subprocess handling - BACKTICKS ARE DEPRECATED:
func3() {
i=0
while [ $i -lt ${#url[@]} ] ; do
   echo "${url[$i]} --- $(curl -LI ${url[$i]} -o /dev/null -w '%{http_code}\n' -s )"
done
}
Finally: in func3() you run through every element of the array. In funcrun() you do the same (that you provide func3() with an argument which it ignores doesn't change anything)! So in fact your run not n curl -invocations for n array elements but n-squared! What "helps" a little is that you only do it for the first 5 elements of $url[] , regardless of how many elements it holds (see the for-loop in funcrun() ), so you only do 5 times of what is necessary instead of n times so.

You might want first consolidate this mess. If it still is "too slow" you might think about this: you could put the curl -invocations in background so that they run in parallel instead of one after the other. Notice, though, that if that is successful depends on the number of URLs you want to pull: a dozen is perhaps no problem, a few hundreds might be, a few thousands are definitely going to be a problem. You will need a "fanout" value in this case so that only a certain maximum number of parallel processes run at the same time.

I hope this helps.

bakunin

Thank you, the syntax was getting better

But the problem is still there

I took about 1 million variables from a database in another function

for example

http://mysite.com/api/tags/$var[$n]/

When I call the post method
The line is called and the answer is called
When the number of links is high

It may take up to one day for code execution If there is a way,

for example
One to 1000 in one thread

1000 to 2000 more in the other thread

3000 to 4000 in the next thread and .....

I think the speed is higher
But I do not know exactly how to manage this code

RudiC · November 23, 2018, 10:27am

You want to upate a million web sites on the internet? No surprise it's taking days...
If I got you wrong, please rephrase your problem and supply more details, like sample input files (not a million line, though - some 10 to 20).

mnnn · November 23, 2018, 11:48am

There is no website, the index process is done in this way. I need to send multi thread requests. Because otherwise it will take a lot of time

Corona688 · November 23, 2018, 1:37pm

If there's no website, what is curl for?

And if this is local, then multithreading it may not be much help. If it's local, curl is about the worst way to do it.

Corona688 · November 23, 2018, 1:45pm

What might be better is packing multiple retrievals into each curl. That'd remove a lot of overhead I think.

mnnn · November 23, 2018, 1:54pm

it is required to send requests through the post method to api
But there are a lot of requests that need to be split up to optimize requests and run in multiple processes

Corona688 · November 26, 2018, 10:18am

So it is a website, then.

Local or not?

mnnn · November 26, 2018, 10:31am

between me and pods is a vpn tunnel