Post

PowerShell - Background Jobs, Runspace Jobs, and Thread Jobs


Update Jan 30,2018 Updated with information about Invoke-Parallel and Split-Pipeline.

Parallelization in PowerShell has been a painful endeavor. There are the native BackgroundJobs, also refered to as PSJobs, which you may recall kicking off using Start-Job cmdlet. However, they felt clunky to use. This can be attributed to the fact that each job was run in its own process. This, combined with the lack of built-in throttling, could lead to some bad outcomes :-).

Runspaces

Along came Boe Prox’s PoshRSJob, which leverages runspaces and runspacepools to reduce the overhead of the runing background jobs.

Runspaces have been discussed going back to at least early 2012. There was an initial webcast by Dr. Tobias Weltner. Afterwards, the topic was picked up by other MVPs, such as Boe Prox and Tome Tanasovski.

PoshRSJob offers a nice abstraction layer on top of runspaces, since they can be a bit complicated and seem daunting. It packages up the functionality and offers it in a way that’s familiar to PSJob users.

In addition to Boe’s PoshRSJob, PowerShell community pointed out Invoke-Parallel by Warren Frame and Split-Pipeline by Roman Kuzmin. Both of these solution rely on runspaces.

Threading

On January 25th, 2018, Paul Higinbotham of PowerShell team, released PSThreadJob module.

As noted in the description

“[t]his module extends the existing PowerShell BackgroundJob to include a new thread based ThreadJob job. It is a lighter weight solution for running concurrent PowerShell scripts that works within the existing PowerShell job infrastructure. So these jobs work with existing PowerShell job cmdlets.”

Furthermore, it introduces much needed ThrottleLimit parameter.


This post does not weigh all pros and cons of each per se. It just explores the performance impact of the three aproaches.

The Test

The test is based on the PoshRSJob’s first example, which computes the file sizes in the subdirectories of a given directory in parallel.

I have chosen to execute the test code against the src directory of dotnet/corefx repository. It contains 185 sub-directories, with 17928 files. It is large enough to visually see the impact and is not expected to change between test runs, and it saves me from having to come up with data to test with :-).

Test Code

Below is the test code for all three methods. It was executed in the following manner:

Test executions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
    # PSJobs
    powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -PSJobs }"

    # RSjobs
    powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -RSJobs }"

    # ThreadJob
    powershell.exe -Command "measure-command -expression { .\Test-Jobs.ps1 -ThreadJobs }"

    # Invoke-Parallel
    powershell.exe -Command "measure-command -expression { .\Test-InvokeParallel.ps1 }"

    # Split-Pipeline
    powershell.exe -Command "measure-command -expression { .\Test-SplitPipeline.ps1 }"

Test-Jobs.ps1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
[CmdletBinding()]
Param 
(
    [switch]$PSJobs,
    [switch]$RSJobs,
    [switch]$ThreadJobs
)

$Throttle = 10

$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory 

# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {

    Param($Directory)
    $measure = (Get-ChildItem $Directory -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
    $sum = $measure.Sum
    $count = $measure.Count
    $size = ([math]::round(($sum/1MB),2))

    [pscustomobject]@{
        Name = $Directory
        FileCount = $count
        SizeMB = $size
    }
}

#######################################
# Test using native BackgroundJob/PSjob
#######################################

if($PSJobs.IsPresent)
{
    $Directories | ForEach-Object {

        while( (Get-Job -State Running).Count -ge $Throttle)
        {
            Write-Verbose "Sleeping..."
            Start-Sleep -Seconds 1
        }
    
        Write-Verbose "Starting job $($_.Name)"
        Start-Job -Name $_.Name -ScriptBlock $GetDirectorySize -ArgumentList $_.FullName
    } 
            
    Get-Job | Wait-Job | Receive-Job | Select Name,FileCount,SizeMB | Format-Table -AutoSize
}

#########################################
# Test using Runspace Jobs (PoshRSJob)
#########################################

elseif($RSJobs.IsPresent)
{
    $Directories | Select-Object -ExpandProperty FullName | Start-RSjob -Name {$_} -ScriptBlock $GetDirectorySize -Throttle $Throttle |`
        Wait-RSJob -ShowProgress | Receive-RSJob | Select Name,FileCount,SizeMB | Format-Table -AutoSize

}

#######################################
# Test using Thread Jobs
#######################################
elseif($ThreadJobs.IsPresent)
{
    $directories | ForEach-Object {  
            Start-ThreadJob -Name $_.Name -ThrottleLimit $Throttle -ScriptBlock $GetDirectorySize -ArgumentList $_.FullName
    }            
    Get-Job | Wait-Job | Receive-Job | Select Name,FileCount,SizeMB | Format-Table -AutoSize
}
else
{
    Write-Error "Forgot to ask for something?"
}

Test-InvokeParallel.ps1

Note: Due to slight variation in ScriptBlock syntax, I decided to keep the test file separate from the Test-Jobs.ps1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[CmdletBinding()]
param()

$Throttle = 10

$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory 

# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {
    
    $measure = (Get-ChildItem $_ -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
    $sum = $measure.Sum
    $count = $measure.Count
    $size = ([math]::round(($sum/1MB),2))

    [pscustomobject]@{
        Name = $_
        FileCount = $count
        SizeMB = $size
    }
}

. (Join-Path $PSScriptRoot "Invoke-Parallel.ps1")

$Directories | Select -ExpandProperty FullName | Invoke-Parallel -ScriptBlock $GetDirectorySize -Throttle $Throttle

Test-SplitPipeline.ps1

Note: Due to slight variation in ScriptBlock syntax, I decided to keep the test file separate from the Test-Jobs.ps1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[CmdletBinding()]
param()

Import-Module SplitPipeline

$Throttle = 10

$Directories = Get-ChildItem -Path D:\Work\Dev\corefx\src -Directory 

# ScriptBlock to compute the file sizes in the directory.
$GetDirectorySize = {

    process {
    
        $measure = (Get-ChildItem $_ -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum)
        $sum = $measure.Sum
        $count = $measure.Count
        $size = ([math]::round(($sum/1MB),2))

        [pscustomobject]@{
            Name = $_
            FileCount = $count
            SizeMB = $size
        }
    }
}

$Directories | Select -ExpandProperty FullName | Split-Pipeline -Script $GetDirectorySize -Count $Throttle

Results

The following results are from the Measure-Command output:

  PSJob Thread Job Runspace Job Invoke-Parallel Split-Pipeline
Minutes 1 0 0 0 0
Seconds 10 5 3 3 1
Milliseconds 268 353 364 382 251
Ticks 702682609 53537051 33644760 33821112 12514944
TotalMinutes 1.171137682 0.089228418 0.0560746 0.05636852 0.02085824
TotalSeconds 70.2682609 5.3537051 3.364476 3.3821112 1.2514944
TotalMilliseconds 70268.2609 5353.7051 3364.476 3382.1112 1251.4944

Additionally, here are the stats on new processes and threads based on procmon traces (and basic Import-Csv magic):

  PSJob Thread Job Runspace Job Invoke-Parallel Split-Pipeline
Processes 185 1 1 1 1
Threads 4300
(23-49/proc)
216 51 33 41

PSThreadJob does a fantastic job of addressing the process bloat of PSJobs. From 185 individual processes to 1 and shaves off 65 seconds, producing an astounding 93% time savings. It does all this without needing to make drastic changes to your code. You can begin by swapping Start-Job with Start-ThreadJob :-)

That being said, PSThreadJob is still a work in progress, as there is no support fort $Using variable and people have noticed issues with exception handling and hanging jobs. I’m sure all that will be worked out overtime, as the code is refined and the module matures.

As for me, I’ll be sticking with PoshRSJobs for now. Boe’s code still has a slight edge, and has served me well for my use cases. Also, I am too lazy to revisit some of the old code. PSThreadJob will definitely be in my toolbox for future projects.

PSThreadJob and PoshRSJob are on the roadmap for PowerShell Core 6.1. So keep an eye out for an official stamp of approval for both modules.

Update: As evident by the numbers, runspace based solutions provide significant time and resource savings. Amongst the runspace based solutions, Split-Pipeline has an edge over Invoke-Parallel and PoshRSJob. I’ll leave my previous statement about my laziness in place for posterity. 😀

References