xkln.net


Multithreading In Python - Running A Specific Number Of Threads

Posted by md on October 17, 2019

One of the great things about Python is how easy it makes creating a pool of threads and assigning them tasks. While PowerShell has -AsJob functionality, it is cumbersome to specify how many jobs it should run at a time in parallel. A practical example would be if we’re looping through an array of 100,000 objects with each one containing a URL or IP address we need to hit, we wouldn’t want to process one at a time, but we also don’t want to spawn 100,000 jobs/threads simultaneously as we iterate through the array. Instead, we may wish to process 50 at a time, and Python makes a piece of cake.

Imports

from multiprocessing.dummy import Pool as ThreadPool

A worker function

This is the function that will be called by each thread, if we specify 50 threads, 50 instances of this function will be executed in parallel. Let’s keep the example simple - our function will only echo back the input value which, for simplicity, we will say is a string. In reality, it is whatever element comes through your loop (an array, object, string, etc.) as you iterate through the elements.

def echoInput(input):
    print(input)

Create the pool and start the threads

Lets first create some sample data to help illustrate the example.

words = ["apple", "orange", "mango", "watermelon", "banana" ]

Finally, we create the pool and start processing our list. Note that on line 1 we specify the number of threads we wish to run at a time, and on line 4 we supply our worker function and the data array.

pool = ThreadPool(2)
try:
    result = pool.map(echoInput, words)except:
    pass

The return data of pool.map will be an array of values that our worker functions return. In our example there is no return data, so result would contain an array resembling [None, None, None, None, None]. We do not have to assign a variable to pool.map, without one the return array is written to the console once all processing is complete.