For large binary transfers, the standard approach is chunking. Chunking can serve two purposes:
- reduce the maximum amount of memory required to process each message
- provide a boundary for recovering partial uploads.
For your use-case #2 probably isn't very necessary.
In gRPC, a client-streaming call allows for fairly natural chunking since it has flow control, pipelining, and is easy to maintain context in the client and server code. If you care about recovery of partial uploads, then bidirectional-streaming works well since the server can be responding with acknowledgements of progress that the client can use to resume.
Chunking using individual RPCs is also possible, but has more complications. When load balancing, the backend may be required to coordinate with other backends each chunk. If you upload the chunks serially, then the latency of the network can slow upload speed as you spend most of the time waiting to receive responses from the server. You then either have to upload in parallel (but how many in parallel?) or increase the chunk size. But increasing the chunk size increases the memory required to process each chunk and increases the granularity for recovering failed uploads. Parallel upload also requires the server to handle out-of-order uploads.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…