To give your task a name: You're looking for the relative complement aka set difference between two arrays:
In set-theory notation, it would be $ItemArray $ExclusionArray
, i.e., those elements in $ItemArray
that aren't also in $ExclusionArray
.
This related question is looking for the symmetric difference between two sets, i.e., the set of elements that are unique to either side - at last that's what the Compare-Object
-based solutions there implement, but only under the assumption that each array has no duplicates.
EyIM's helpful answer is conceptually simple and concise.
A potential problem is performance: a lookup in the exclusion array must be performed for each element in the input array.
With small arrays, this likely won't matter in practice.
With larger arrays, LINQ offers a substantially faster solution:
Note: In order to benefit from the LINQ solution, your arrays should be in memory already, and the benefit is greater the larger the exclusion array is. If your input is streaming via the pipeline, the overhead from executing the pipeline may make attempts to optimize array processing pointless or even counterproductive, in which case sticking with the native PowerShell solution makes sense - see iRon's answer.
# Declare the arrays as [string[]]
# so that calling the LINQ method below works as-is.
# (You could also cast to [string[]] ad hoc.)
[string[]] $ItemArray = 'a','b','c','d'
[string[]] $exclusionArray = 'b','c'
# Return only those elements in $ItemArray that aren't also in $exclusionArray
# and convert the result (a lazy enumerable of type [IEnumerable[string]])
# back to an array to force its evaluation
# (If you directly enumerate the result in a pipeline, that step isn't needed.)
[string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray) # -> 'a', 'd'
Note the need to use the LINQ types explicitly, via their static methods, because PowerShell, as of v7, has no support for extension methods.
However, there is a proposal on GitHub to add such support; this related proposal asks for improved support for calling generic methods.
See this answer for an overview of how to currently call LINQ methods from PowerShell.
Performance comparison:
Tip of the hat to iRon for his input.
The following benchmark code uses the Time-Command
function to compare the two approaches, using arrays with roughly 4000 and 2000 elements, respectively, which - as in the question - differ by only 2 elements.
Note that in order to level the playing field, the .Where()
array method (PSv4+) is used instead of the pipeline-based Where-Object
cmdlet, as .Where()
is faster with arrays already in memory.
Here are the results averaged over 10 runs; note the relative performance, as shown in the Factor
columns; from a single-core Windows 10 VM running Windows PowerShell v5.1.:
Factor Secs (10-run avg.) Command TimeSpan
------ ------------------ ------- --------
1.00 0.046 # LINQ... 00:00:00.0455381
8.40 0.382 # Where ... -notContains... 00:00:00.3824038
The LINQ solution is substantially faster - by a factor of 8+ (though even the much slower solution only took about 0.4 seconds to run).
It seems that the performance gap is even wider in PowerShell Core, where I've seen a factor of around 19 with v7.0.0-preview.4.; interestingly, both tests ran faster individually than in Windows PowerShell.
Benchmark code:
# Script block to initialize the arrays.
# The filler arrays are randomized to eliminate caching effects in LINQ.
$init = {
$fillerArray = 1..1000 | Get-Random -Count 1000
[string[]] $ItemArray = $fillerArray + 'a' + $fillerArray + 'b' + $fillerArray + 'c' + $fillerArray + 'd'
[string[]] $exclusionArray = $fillerArray + 'b' + $fillerArray + 'c'
}
# Compare the average of 10 runs.
Time-Command -Count 10 { # LINQ
. $init
$result = [string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray)
}, { # Where ... -notContains
. $init
$result = $ItemArray.Where({ $exclusionArray -notcontains $_ })
}