Multithreading In PowerShell - Running A Specific Number Of Threads

In this post we’re aiming to accomplish in PowerShell what the previous post did in Python, which is to create a pool of threads to carry out a set of given tasks concurrently. The closest equivalent to what we achieved in Python can be accomplished in PowerShell using Runspace Pools. PowerShell also has PSJobs functionality via Start-Job and -AsJob, but these don’t allow us to specify the maximum number of threads we want, at least not without a significant amount of scaffolding or using non-default modules.

PowerShell Runspace Pools

Let’s use a simple example to illustrate how Runspace Pools are used. We can start by defining our worker function (as a ScriptBlock), which is the function that will be doing whatever work we wish to run in parallel. For this example, let’s create some files where the filename is a parameter we pass in. We’ll also add a Start-Sleep command to simulate some lengthy process that would make this work suitable for multithreading.

$Worker = {
    param($Filename)
    Write-Host "Processing $Filename"
    Start-Sleep -Seconds 5 # Doing some work....
    $Item = New-Item -Name $Filename
    Write-Output $Item.FullName}

We then define our runspace pools and configure the number of threads we wish to run. An ArrayList for running jobs is also created which will help us monitor the batch processing status.

$MaxRunspaces = 5

$RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxRunspaces)
$RunspacePool.Open()

$Jobs = New-Object System.Collections.ArrayList

The CreateRunspacePool method takes two values, the minimum and maximum number of threads. Below we define our sample data that will serve as the filenames for the files we’re creating.

$Filenames = @("file1.txt", "file2.txt", "file3.txt", "file4.txt", "file5.txt", "file6.txt", "file7.txt", "file8.txt", "file9.txt", "file10.txt", "file11.txt")

Finally we can bring everything together and run our tasks in parallel.

foreach ($File in $Filenames) {
    Write-Host "Creating runspace for $File"
    $PowerShell = [powershell]::Create()
	$PowerShell.RunspacePool = $RunspacePool
	$PowerShell.AddScript($Worker).AddArgument($File) | Out-Null
    
    $JobObj = New-Object -TypeName PSObject -Property @{
		Runspace = $PowerShell.BeginInvoke()
		PowerShell = $PowerShell  
    }

    $Jobs.Add($JobObj) | Out-Null
}

while ($Jobs.Runspace.IsCompleted -contains $false) {
    Write-Host (Get-date).Tostring() "Still running..."
	Start-Sleep 1
}

Putting it all together

If we combine the above code snippets we can have a reasonable boilerplate for future runspace pool usage.

$Worker = {
    param($Filename)
    Write-Host "Processing $filename"
    Start-Sleep -Seconds 5 # Doing some work....
    $Item = New-Item -Name $Filename
    Write-Output $Item.FullName
}

$MaxRunspaces = 5

$RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxRunspaces)
$RunspacePool.Open()

$Jobs = New-Object System.Collections.ArrayList

$Filenames = @("file1.txt", "file2.txt", "file3.txt", "file4.txt", "file5.txt", "file6.txt", "file7.txt", "file8.txt", "file9.txt", "file10.txt", "file11.txt")

foreach ($File in $Filenames) {
    Write-Host "Creating runspace for $File"
    $PowerShell = [powershell]::Create()
	$PowerShell.RunspacePool = $RunspacePool
    $PowerShell.AddScript($Worker).AddArgument($File) | Out-Null
    
    $JobObj = New-Object -TypeName PSObject -Property @{
		Runspace = $PowerShell.BeginInvoke()
		PowerShell = $PowerShell  
    }

    $Jobs.Add($JobObj) | Out-Null
}

while ($Jobs.Runspace.IsCompleted -contains $false) {
    Write-Host (Get-date).Tostring() "Still running..."
	Start-Sleep 1
}

Caveats and Workarounds

There are a few gotchas when using runspaces that will probably cause a few headaches the first time you use them.

Getting return data

Below is the output for the above code

Creating runspace for file1.txt
Creating runspace for file2.txt
Creating runspace for file3.txt
Creating runspace for file4.txt
Creating runspace for file5.txt
Creating runspace for file6.txt
Creating runspace for file7.txt
Creating runspace for file8.txt
Creating runspace for file9.txt
Creating runspace for file10.txt
Creating runspace for file11.txt
4/12/2019 7:47:22 PM Still running...
4/12/2019 7:47:23 PM Still running...
4/12/2019 7:47:24 PM Still running...
4/12/2019 7:47:25 PM Still running...
4/12/2019 7:47:26 PM Still running...
4/12/2019 7:47:27 PM Still running...
4/12/2019 7:47:28 PM Still running...
4/12/2019 7:47:29 PM Still running...
4/12/2019 7:47:30 PM Still running...
4/12/2019 7:47:31 PM Still running...

The most obvious thing is the lack of output from our Worker function. We have both Write-Host and Write-Output commands inside the function, but none are present in the output. This is where some of the complexity around runspaces starts to show. To get output, we need to run the EndInvoke() method of PowerShell instance for each iteration and provide it the runspace handle. Both of these are present in the $Jobs ArrayList.

PS C:\> $Jobs

PowerShell                              Runspace
----------                              --------
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult
System.Management.Automation.PowerShell System.Management.Automation.PowerShellAsyncResult


PS C:\> $Jobs[0].PowerShell.EndInvoke($Jobs[0].Runspace)
C:\Users\md\file1.txt

This has some ramifications for error handling, we either need to completely handle errors inside the worker function, or we need to append error data to our return object.

Access to the current PowerShell environment

If you ran the above code from anywhere but the default path your PowerShell is configured for (typically C:\Users\<username>) you would have noticed that the files weren’t created in your working directory, rather in the default path. This is because each runspace runs in its own environment. However, even this can be a little tricky to understand. What would be the expected behavior if we slightly modified our worker function to this?

$Worker = {
    param($Filename)
    Write-Host "Processing $filename"
    if ($Filename -eq "file1.txt") {Set-Location C:\Temp}    Start-Sleep -Seconds 5 # Doing some work....
    New-Item -Name $Filename
    Write-Output $NewItem.FullName
}

The initial assumption tends to be that only file1.txt ends up in C:\Temp, whereas what actually happens is that files file1.txt, file6.txt, and file11.txt all end up in there. The reason for this is that runspaces get reused. When file1.txt1 runs in the first runspace, the remaining 4 runspaces are used by file2.txt, file3.txt, file4.txt, and file5.txt. The first runspace then sets its working directory to C:\Temp as the if condition is satisfied. When the first runspace is free, having completed the work for file1.txt, the next job is started, which happens to be file6.txt. The path at this point is still set to C:\Temp and that is where the file is created, and the same applies for file11.txt.

Variable access

Runspaces don’t have access to variables defined in the parent PowerShell environment. This means if we define a higher scoped variable for the desired file path where we want to place the files, it will not work.

$Filepath = "C:\Temp\"
$Worker = {
    param($Filename)
    try {
        $Result = New-Item -Name $Filename -Path $Filepath    }
    catch {
        $Result = $_.Exception.Message
    }
    Write-Output $Result
}

What line 6 is actually executing here would be equivalent to the following

New-Item -Name $Filename -Path $Null

We can confirm this if we look at the output.

PS C:\> $Jobs[0].PowerShell.EndInvoke($Jobs[0].Runspace)
Cannot bind argument to parameter 'Path' because it is null.

The only way to have a runspace access variables that aren’t passed in as arguments is to use synchronized hashtables. Another great advantage of using synchronized hashtables is that we can also write to/modify them safely. To illustrate this, we can use the following code.

$Configuration = [hashtable]::Synchronized(@{})$Configuration.FilePath = "C:\Temp\"$Configuration.CreatedFiles = @()
$Worker = {
    Param($Filename, $Configuration)    Write-Host "Processing $filename"
    Start-Sleep -Seconds 5 # Doing some work....
    Try {
        $Result = New-Item -Name $Filename -Path $Configuration.FilePath        $Configuration.CreatedFiles += $Result.FullName    }
    Catch {
        $Result = $_.Exception.Message
    }

    Write-Output $Result
}

$MaxRunspaces = 5

$SessionState = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault()$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1, $MaxRunspaces, $SessionState, $Host)$RunspacePool.Open()

$Jobs = New-Object System.Collections.ArrayList

$Filenames = @("file1.txt", "file2.txt", "file3.txt", "file4.txt", "file5.txt", "file6.txt", "file7.txt", "file8.txt", "file9.txt", "file10.txt", "file11.txt")

foreach ($File in $Filenames) {
    Write-Host "Creating runspace for $File"
    $PowerShell = [powershell]::Create()
	$PowerShell.RunspacePool = $RunspacePool
    $PowerShell.AddScript($Worker).AddArgument($File).AddArgument($Configuration) | Out-Null    
    $JobObj = New-Object -TypeName PSObject -Property @{
		Runspace = $PowerShell.BeginInvoke()
		PowerShell = $PowerShell  
    }

    $Jobs.Add($JobObj) | Out-Null
}

while ($Jobs.Runspace.IsCompleted -contains $false) {
    Write-Host (Get-date).Tostring() "Still running..."
	Start-Sleep 1
}

And our hashtable has been modified as expected.

PS C:\> $Configuration

Name                           Value
----                           -----
FilePath                       C:\Temp\
CreatedFiles                   {C:\Temp\file1.txt, C:\Temp\file2.txt, C:\Temp\file3.txt, C:\Temp\file4.txt...}

Lastly, this also sends the Write-Host data to our parent console - neat!

Creating runspace for file1.txt
Creating runspace for file2.txt
Processing file1.txt
Creating runspace for file3.txt
Processing file2.txt
Creating runspace for file4.txt
Processing file3.txt
Creating runspace for file5.txt
Processing file4.txt
Creating runspace for file6.txt
Creating runspace for file7.txt
Creating runspace for file8.txt
Processing file5.txt
Creating runspace for file9.txt
Creating runspace for file10.txt
Creating runspace for file11.txt
4/12/2019 7:58:39 PM Still running...
4/12/2019 7:58:40 PM Still running...
4/12/2019 7:58:41 PM Still running...
4/12/2019 7:58:42 PM Still running...
4/12/2019 7:58:43 PM Still running...
Processing file6.txt
Processing file7.txt
Processing file8.txt
Processing file9.txt
Processing file10.txt
4/12/2019 7:58:44 PM Still running...
4/12/2019 7:58:45 PM Still running...
4/12/2019 7:58:46 PM Still running...
4/12/2019 7:58:47 PM Still running...
4/12/2019 7:58:48 PM Still running...
Processing file11.txt
4/12/2019 7:58:49 PM Still running...
4/12/2019 7:58:50 PM Still running...
4/12/2019 7:58:51 PM Still running...
4/12/2019 7:58:52 PM Still running...
4/12/2019 7:58:53 PM Still running...