PowerShell - Background Jobs, Runspace Jobs, and Thread Jobs
Update Jan 30,2018 Updated with information about
Invoke-Parallel
andSplit-Pipeline
.
Parallelization in PowerShell has been a painful endeavor. There are the native BackgroundJobs, also refered to as PSJobs, which you may recall kicking off using Start-Job
cmdlet. However, they felt clunky to use. This can be attributed to the fact that each job was run in its own process. This, combined with the lack of built-in throttling, could lead to some bad outcomes :-).
Runspaces
Along came Boe Prox’s PoshRSJob, which leverages runspaces and runspacepools to reduce the overhead of the runing background jobs.
Runspaces have been discussed going back to at least early 2012. There was an initial webcast by Dr. Tobias Weltner. Afterwards, the topic was picked up by other MVPs, such as Boe Prox and Tome Tanasovski.
PoshRSJob offers a nice abstraction layer on top of runspaces, since they can be a bit complicated and seem daunting. It packages up the functionality and offers it in a way that’s familiar to PSJob users.
In addition to Boe’s PoshRSJob, PowerShell community pointed out Invoke-Parallel by Warren Frame and Split-Pipeline by Roman Kuzmin. Both of these solution rely on runspaces.
Threading
On January 25th, 2018, Paul Higinbotham of PowerShell team, released PSThreadJob module.
As noted in the description
“[t]his module extends the existing PowerShell BackgroundJob to include a new thread based ThreadJob job. It is a lighter weight solution for running concurrent PowerShell scripts that works within the existing PowerShell job infrastructure. So these jobs work with existing PowerShell job cmdlets.”
Furthermore, it introduces much needed ThrottleLimit
parameter.
This post does not weigh all pros and cons of each per se. It just explores the performance impact of the three aproaches.
The Test
The test is based on the PoshRSJob’s first example, which computes the file sizes in the subdirectories of a given directory in parallel.
I have chosen to execute the test code against the src
directory of dotnet/corefx repository. It contains 185 sub-directories, with 17928 files. It is large enough to visually see the impact and is not expected to change between test runs, and it saves me from having to come up with data to test with :-).
Test Code
Below is the test code for all three methods. It was executed in the following manner:
Test executions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# PSJobs
powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -PSJobs }"
# RSjobs
powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -RSJobs }"
# ThreadJob
powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -ThreadJobs }"
# Invoke-Parallel
powershell.exe -Command "measure-command -expression { .\Test-InvokeParallel.ps1 }"
# Split-Pipeline
powershell.exe -Command "measure-command -expression { .\Test-SplitPipeline.ps1 }"
Test-Jobs.ps1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
[CmdletBinding()]
Param
(
[switch]$PSJobs,
[switch]$RSJobs,
[switch]$ThreadJobs
)
$Throttle = 10
$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory
# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {
Param($Directory)
$measure = (Get-ChildItem $Directory -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
$sum = $measure.Sum
$count = $measure.Count
$size = ([math]::round(($sum/1MB),2))
[pscustomobject]@{
Name = $Directory
FileCount = $count
SizeMB = $size
}
}
#######################################
# Test using native BackgroundJob/PSjob
#######################################
if($PSJobs.IsPresent)
{
$Directories | ForEach-Object {
while( (Get-Job -State Running).Count -ge $Throttle)
{
Write-Verbose "Sleeping..."
Start-Sleep -Seconds 1
}
Write-Verbose "Starting job $($_.Name)"
Start-Job -Name $_.Name -ScriptBlock $GetDirectorySize -ArgumentList $_.FullName
}
Get-Job | Wait-Job | Receive-Job | Select Name,FileCount,SizeMB | Format-Table -AutoSize
}
#########################################
# Test using Runspace Jobs (PoshRSJob)
#########################################
elseif($RSJobs.IsPresent)
{
$Directories | Select-Object -ExpandProperty FullName | Start-RSjob -Name {$_} -ScriptBlock $GetDirectorySize -Throttle $Throttle |`
Wait-RSJob -ShowProgress | Receive-RSJob | Select Name,FileCount,SizeMB | Format-Table -AutoSize
}
#######################################
# Test using Thread Jobs
#######################################
elseif($ThreadJobs.IsPresent)
{
$directories | ForEach-Object {
Start-ThreadJob -Name $_.Name -ThrottleLimit $Throttle -ScriptBlock $GetDirectorySize -ArgumentList $_.FullName
}
Get-Job | Wait-Job | Receive-Job | Select Name,FileCount,SizeMB | Format-Table -AutoSize
}
else
{
Write-Error "Forgot to ask for something?"
}
Test-InvokeParallel.ps1
Note: Due to slight variation in ScriptBlock syntax, I decided to keep the test file separate from the Test-Jobs.ps1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[CmdletBinding()]
param()
$Throttle = 10
$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory
# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {
$measure = (Get-ChildItem $_ -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
$sum = $measure.Sum
$count = $measure.Count
$size = ([math]::round(($sum/1MB),2))
[pscustomobject]@{
Name = $_
FileCount = $count
SizeMB = $size
}
}
. (Join-Path $PSScriptRoot "Invoke-Parallel.ps1")
$Directories | Select -ExpandProperty FullName | Invoke-Parallel -ScriptBlock $GetDirectorySize -Throttle $Throttle
Test-SplitPipeline.ps1
Note: Due to slight variation in ScriptBlock syntax, I decided to keep the test file separate from the Test-Jobs.ps1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[CmdletBinding()]
param()
Import-Module SplitPipeline
$Throttle = 10
$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory
# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {
process {
$measure = (Get-ChildItem $_ -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
$sum = $measure.Sum
$count = $measure.Count
$size = ([math]::round(($sum/1MB),2))
[pscustomobject]@{
Name = $_
FileCount = $count
SizeMB = $size
}
}
}
$Directories | Select -ExpandProperty FullName | Split-Pipeline -Script $GetDirectorySize -Count $Throttle
Results
The following results are from the Measure-Command
output:
PSJob | Thread Job | Runspace Job | Invoke-Parallel | Split-Pipeline | |
---|---|---|---|---|---|
Minutes | 1 | 0 | 0 | 0 | 0 |
Seconds | 10 | 5 | 3 | 3 | 1 |
Milliseconds | 268 | 353 | 364 | 382 | 251 |
Ticks | 702682609 | 53537051 | 33644760 | 33821112 | 12514944 |
TotalMinutes | 1.171137682 | 0.089228418 | 0.0560746 | 0.05636852 | 0.02085824 |
TotalSeconds | 70.2682609 | 5.3537051 | 3.364476 | 3.3821112 | 1.2514944 |
TotalMilliseconds | 70268.2609 | 5353.7051 | 3364.476 | 3382.1112 | 1251.4944 |
Additionally, here are the stats on new processes and threads based on procmon traces (and basic Import-Csv
magic):
PSJob | Thread Job | Runspace Job | Invoke-Parallel | Split-Pipeline | |
---|---|---|---|---|---|
Processes | 185 | 1 | 1 | 1 | 1 |
Threads | 4300 (23-49/proc) |
216 | 51 | 33 | 41 |
PSThreadJob does a fantastic job of addressing the process bloat of PSJobs. From 185 individual processes to 1 and shaves off 65 seconds, producing an astounding 93% time savings. It does all this without needing to make drastic changes to your code. You can begin by swapping Start-Job
with Start-ThreadJob
:-)
That being said, PSThreadJob is still a work in progress, as there is no support fort $Using
variable and people have noticed issues with exception handling and hanging jobs. I’m sure all that will be worked out overtime, as the code is refined and the module matures.
As for me, I’ll be sticking with PoshRSJobs for now. Boe’s code still has a slight edge, and has served me well for my use cases. Also, I am too lazy to revisit some of the old code. PSThreadJob will definitely be in my toolbox for future projects.
PSThreadJob and PoshRSJob are on the roadmap for PowerShell Core 6.1. So keep an eye out for an official stamp of approval for both modules.
Update: As evident by the numbers, runspace based solutions provide significant time and resource savings. Amongst the runspace based solutions, Split-Pipeline
has an edge over Invoke-Parallel
and PoshRSJob. I’ll leave my previous statement about my laziness in place for posterity. 😀
References
- PSThreadJob by Paul Higinbotham
- PoshRSJob by Boe Prox
- Invoke-Parallel by Warren Frame
- Split-Pipeline by Roman Kuzmin
- PowerShell Core 6.1 Roadmap
- ForEach-Parallel by Tome Tanasovski
- Using Background Runspaces Instead of PSJobs For Better Performance by Boe Prox
- Using Threadjob for Performance by Nicholas M. Getchell