Your own Start-Process
-based solution that uses -RedirectStandardOutput
and -RedirectStandardError
indeed creates (BOM-less) UTF-8-encoded output files, but note that they too invariably have a trailing newline.
However, you do not need Start-Process
, as you can make PowerShell's redirection operator, >
produce UTF-8 files (also with a trailing newline) too.
The following examples use a sample cmd.exe
call that produces both stdout and stderr output.
In PowerShell (Core) v6+, no extra effort is needed, because >
produces (BOM-less) UTF-8 files by default (a default that is used consistently; if you want UTF-8 with a BOM, you can use the technique detailed for Windows PowerShell below, but with value 'utf8bom'
):
cmd /c 'echo hü & dir c:
osuch' 2>stderr.txt >stdout.txt
In Windows PowerShell, >
produces UTF-16LE ("Unicode") by default, but in version 5.1 you can (temporarily) reconfigure it use UTF-8 instead, albeit invariably with a BOM; see this answer for details; another caveat is that the first stderr line captured in the file will be formatted "noisily", like a PowerShell error:
# Windows PowerShell v5.1:
# Make `>` and its effective alias, Out-File, use UTF-8 with a BOM in the
# remainder of the session.
# Save and restore any previous value if you want to scope the behavior
# to select commands only.
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
cmd /c 'echo hü & dir c:
osuch' 2>stderr.txt >stdout.txt
Caveat:
- Whenever PowerShell processes an external program's output, it invariably decodes it into .NET strings first. Any external program is assumed to produce output based on the character encoding stored in
[Console]::OutputEncoding
, which defaults to the system's active OEM code page. This works as expected with cmd.exe
, but there are other console applications that use different encodings - notably node.exe
(Node.js) and python
, which use UTF-8 and the system's active ANSI code page, respectively - in which case [Console]::OutputEncoding
must be set to that encoding first; see this answer for more information.
As for your statements and questions:
The trailing new line is not a valid UTF-8 character apparently
PowerShell's >
operator and file-output cmdlets apply their character encoding consistently, so the trailing newline's encoding is always consistent with that of the other characters in the file.
Most likely it was the UTF-16LE ("Unicode") encoding used by Windows PowerShell by the default that was the true problem, and you may have only noticed it with respect to the newline.
Perhaps there's a way to capture the stderr and stdout to separate variables
Stdout can be captured by a simple variable assignment, which captures multiple output lines as an array of strings:
$stdout = cmd /c 'echo hü & dir c:
osuch'
You cannot separately capture stderr output, but you can merge stderr into stdout with 2>&1
and even later separate the streams' respective output lines again, based on their data types: stdout lines are always strings, whereas stderr lines are always [ErrorRecord]
instances:
# Note the 2>&1 redirection.
$stdoutAndErr = cmd /c 'echo hü & dir c:
osuch' 2>&1
# If desired, you can split the captured output into stdout and stderr output.
# The [string[]] cast converts the [ErrorRecord] instances to strings too.
$stdout, [string[]] $stderr = $stdoutAndErr.Where({ $_ -is [string] }, 'Split')
# Now $stdout is the array of stdout lines, and $stderr the array of stderr lines.
# If desired, you could write them to files *without a trailing newline* as follows:
$stdout -join [Environment]::NewLine | Set-Content -NoNewLine -Encoding utf8 stdout.txt
$stderr -join [Environment]::NewLine | Set-Content -NoNewLine -Encoding utf8 stderr.txt
You can also apply these techniques to PowerShell-native commands (and you can even merge all other streams that PowerShell supports into the success output stream, PowerShell's analog to stdout, with *>&1
).
However, if a given PowerShell-native command is a cmdlet / advanced script or function, the more convenient alternative is to use the common -OutVariable
parameter (for success-stream output) and common -ErrorVariable
parameter (for error-stream output).