One of the great things about Python is how easy it makes creating a pool of threads and assigning them tasks.
A practical example would be if we’re looping through an array of 100,000 objects with each one containing a URL or IP address we need to hit, we wouldn’t want to process one at a time, but we also don’t want to spawn 100,000 jobs/threads simultaneously as we iterate through the array. Instead, we may wish to process 50 at a time, and Python makes a that piece of cake.
from multiprocessing.dummy import Pool as ThreadPool
A worker function
This is the function that will be called by each thread, if we specify 50 threads, 50 instances of this function will be executed in parallel. Let’s keep the example simple - our function will only echo back the input value which, for simplicity, we will say is a string. In reality, it is whatever element comes through your loop (an array, object, string, etc.) as you iterate through the elements.
def echoInput(input): print(input)
Create the pool and start the threads
Lets first create some sample data to help illustrate the example.
words = ["apple", "orange", "mango", "watermelon", "banana" ]
Finally, we create the pool and start processing our list. Note that on line 1 we specify the number of threads we wish to run at a time, and on line 2 we supply our worker function and the data array.
pool = ThreadPool(2) result = pool.map(echoInput, words)
The return data of
pool.map will be an array of values that our worker functions return. In our example there is no return data, so
result would contain an array resembling
[None, None, None, None, None]. We do not have to assign a variable to
pool.map, without one the return array is written to the console once all processing is complete.