I didn't like the compression suggestions, so here's what I would do.
You don't want to compress a video stream (so MPEG, AVI, etc are out of the question -- these don't have to be real-time) and you don't want to compress individual pictures (since that's just stupid).
Basically what you want to do is detect if things change and send the differences. You're on the right track with that; most video compressors do that. You also want a fast compression/decompression algorithm; especially if you go to more FPS that will become more relevant.
Differences. First off, eliminate all branches in your code, and make sure memory access is sequential (e.g. iterate x in the inner loop). The latter will give you cache locality. As for the differences, I'd probably use a 64-bit XOR; it's easy, branchless and fast.
If you want performance, it's probably better to do this in C++: The current C# implementation doesn't vectorize your code, and that will help you a great deal here.
Do something like this (I'm assuming 32bit pixel format):
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
for (int x=0; x<width; x += 2)
row2[x] ^= row1[x];
}
Fast compression and decompression usually means simpler compression algorithms. https://code.google.com/p/lz4/ is such an algorithm, and there's a proper .NET port available for that as well. You might want to read on how it works too; there is a streaming feature in LZ4 and if you can make it handle 2 images instead of 1 that will probably give you a nice compression boost.
All in all, if you're trying to compress white noise, it simply won't work and your frame rate will drop. One way to solve this is to reduce the colors if you have too much 'randomness' in a frame. A measure for randomness is entropy, and there are several ways to get a measure of the entropy of a picture ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ). I'd stick with a very simple one: check the size of the compressed picture -- if it's above a certain limit, reduce the number of bits; if below, increase the number of bits.
Note that increasing and decreasing bits is not done with shifting in this case; you don't need your bits to be removed, you simply need your compression to work better. It's probably just as good to use a simple 'AND' with a bitmask. For example, if you want to drop 2 bits, you can do it like this:
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
ulong mask = 0xFFFCFCFCFFFCFCFC;
for (int x=0; x<width; x += 2)
row2[x] = (row2[x] ^ row1[x]) & mask;
}
PS: I'm not sure what I would do with the alpha component, I'll leave that up to your experimentation.
Good luck!
The long answer
I had some time to spare, so I just tested this approach. Here's some code to support it all.
This code normally run over 130 FPS with a nice constant memory pressure on my laptop, so the bottleneck shouldn't be here anymore. Note that you need LZ4 to get this working and that LZ4 is aimed at high speed, not high compression ratio's. A bit more on that later.
First we need something that we can use to hold all the data we're going to send. I'm not implementing the sockets stuff itself here (although that should be pretty simple using this as a start), I mainly focused on getting the data you need to send something over.
// The thing you send over a socket
public class CompressedCaptureScreen
{
public CompressedCaptureScreen(int size)
{
this.Data = new byte[size];
this.Size = 4;
}
public int Size;
public byte[] Data;
}
We also need a class that will hold all the magic:
public class CompressScreenCapture
{
Next, if I'm running high performance code, I make it a habit to preallocate all the buffers first. That'll save you time during the actual algorithmic stuff. 4 buffers of 1080p is about 33 MB, which is fine - so let's allocate that.
public CompressScreenCapture()
{
// Initialize with black screen; get bounds from screen.
this.screenBounds = Screen.PrimaryScreen.Bounds;
// Initialize 2 buffers - 1 for the current and 1 for the previous image
prev = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
cur = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
// Clear the 'prev' buffer - this is the initial state
using (Graphics g = Graphics.FromImage(prev))
{
g.Clear(Color.Black);
}
// Compression buffer -- we don't really need this but I'm lazy today.
compressionBuffer = new byte[screenBounds.Width * screenBounds.Height * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
}
private Rectangle screenBounds;
private Bitmap prev;
private Bitmap cur;
private byte[] compressionBuffer;
private int backbufSize;
private CompressedCaptureScreen backbuf;
private int n = 0;
First thing to do is capture the screen. This is the easy part: simply fill the bitmap of the current screen:
private void Capture()
{
// Fill 'cur' with a screenshot
using (var gfxScreenshot = Graphics.FromImage(cur))
{
gfxScreenshot.CopyFromScreen(screenBounds.X, screenBounds.Y, 0, 0, screenBounds.Size, CopyPixelOperation.SourceCopy);
}
}
As I said, I don't want to compress 'raw' pixels. Instead, I'd much rather compress XOR masks of previous and the current image. Most of the times this will give you a whole lot of 0's, which is easy to compress:
private unsafe void ApplyXor(BitmapData previous, BitmapData current)
{
byte* prev0 = (byte*)previous.Scan0.ToPointer();
byte* cur0 = (byte*)current.Scan0.ToPointer();
int height = previous.Height;
int width = previous.Width;
int halfwidth = width / 2;
fixed (byte* target = this.compressionBuffer)
{
ulong* dst = (ulong*)target;
for (int y = 0; y < height; ++y)
{
ulong* prevRow = (ulong*)(prev0 + previous.Stride * y);
ulong* curRow = (ulong*)(cur0 + current.Stride * y);
for (int x = 0; x < halfwidth; ++x)
{
*(dst++) = curRow[x] ^ prevRow[x];
}
}
}
}
For the compression algorithm I simply pass the buffer to LZ4 and let it do its magic.
private int Compress()
{
// Grab the backbuf in an attempt to update it with new data
var backbuf = this.backbuf;
backbuf.Size = LZ4.LZ4Codec.Encode(
this.compressionBuffer, 0, this.compressionBuffer.Length,
backbuf.Data, 4, backbuf.Data.Length-4);
Buffer.BlockCopy(BitConverter.GetBytes(backbuf.Size), 0, backbuf.Data, 0, 4);
return backbuf.Size;
}
One thing to note here is that I make it a habit to put everything in my buffer that I need to send over the TCP/IP socket. I don't want to move data around if I can easily avoid it, so I'm simply putting everything that I need on the other side there.
As for the sockets itself, you can use a-sync TCP sockets here (I would), but if you do, you will need to add an extra buffer.
The only thing that remains is to glue everything together and put some statistics on the screen:
public void Iterate()
{
Stopwatch sw = Stopwatch.StartNew();
// Capture a screen:
Capture();
TimeSpan timeToCapture = sw.Elapsed;
// Lock both images:
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
try
{
// Xor screen:
ApplyXor(locked2, locked1);
TimeSpan timeToXor = sw.Elapsed;
// Compress screen:
int length = Compress();
TimeSpan timeToCompress = sw.Elapsed;
if ((++n) % 50 == 0)
{
Console.Write("Iteration: {0:0.00}s, {1:0.00}s, {2:0.00}s " +
"{3} Kb => {4:0.0} FPS
",
timeToCapture.TotalSeconds, timeToXor.TotalSeconds,
timeToCompress.TotalSeconds, length / 1024,
1.0 / sw.Elapsed.TotalSeconds);
}
// Swap buffers:
var tmp = cur;
cur = prev;
prev = tmp;
}
finally
{
cur.UnlockBits(locked1);
prev.UnlockBits(locked2);
}
}
Note that I reduce Console output to ensure that's not the bottleneck. :-)
Simple improvements
It's a bit wasteful to compress all those 0's, right? It's pretty easy to track the min and max y position that has data using a simple boolean.
ulong tmp = curRow[x] ^ prevRow[x];
*(dst++) = tmp;
hasdata |= tmp != 0;
You also probably don't want to call Compress
if you don't have to.
After adding this feature you'll get something like this on your screen:
Iteration: 0.00s, 0.01s, 0.01s 1 Kb => 152.0 FPS
Using another compression algorithm might also help. I stuck to LZ4 because it's simple to use, it's blazing fast and compresses pretty well -- still, there are other options that might work better. See http://fastcompression.blogspot.nl/ for a comparison.
If you have a bad connection or if you're streaming video over a remote connection, all this won't work. Best to reduce the pixel values here. That's quite simple: apply a simple 64-bit mask during the xor to both the previous and current picture... You can also try using indexed colors - anyhow, there's a ton of different things you can try here; I just kept it simple because that's probably good enough.
You can also use Parallel.For
for the xor loop; personally I didn't really care about that.
A bit more challenging
If you have 1 server that is serving multiple clients, things will get a bit more challenging, as they will refresh at different rates. We want the fastest refreshing client to determine the server speed - not slowest. :-)
To implement this, the relation between the prev
and cur
has to change. If we simply 'xor' away like here, we'll end up with a completely garbled picture at the slower clients.
To solve that, we don't want to swap prev
anymore, as it should hold key frames (that you'll refresh when the compressed data becomes too big) and cur
will hold incremental data from the 'xor' results. This means you can basically grab an arbitrary 'xor'red frame and send it over the line - as long as the <co