Wire Protocol Specification

Version 1 — Normative — For cross-language implementors

1. Overview & Conventions

vgi-rpc is an RPC framework where serialization uses the Apache Arrow IPC Streaming Format.

  • All integers are little-endian unless stated otherwise
  • All metadata strings are UTF-8 encoded
  • Wire protocol version: "1" (ASCII 0x31)
  • Metadata keys in the vgi_rpc.* namespace are framework-reserved
Term Definition
IPC streamA complete Arrow IPC streaming-format sequence: schema + record batches + EOS marker
BatchAn Arrow RecordBatch — zero or more rows conforming to a schema
Custom metadataPer-batch KeyValueMetadata (distinct from schema-level metadata)
Zero-row batchA batch with num_rows == 0. Used for log, error, pointer, and completion signals
Data batchA batch with num_rows > 0, or a zero-row batch without log/error metadata

2. Arrow IPC Framing

Each logical message exchange uses one or more IPC streams written sequentially on the same byte stream (pipe, TCP socket, HTTP body, etc.).

An IPC stream consists of:

  1. Schema message — describes the columns and their Arrow types
  2. Zero or more RecordBatch messages — each optionally carrying per-batch custom metadata
  3. EOS marker — the 8-byte sequence 0xFF 0xFF 0xFF 0xFF 0x00 0x00 0x00 0x00

Multiple IPC streams are written sequentially on the same underlying byte stream. Each reader opens one stream, reads until EOS, and stops. The next reader picks up immediately after.

3. Metadata Key Reference

Request metadata

Key Value Description
vgi_rpc.methodUTF-8 method nameTarget RPC method. Required.
vgi_rpc.request_version"1"Wire protocol version. Required.
vgi_rpc.request_id16-char hexPer-request correlation ID. Optional.
traceparentW3C Trace ContextOpenTelemetry trace propagation. Optional.
vgi_rpc.shm_segment_nameUTF-8 OS nameShared memory segment name. Optional.
vgi_rpc.shm_segment_sizeDecimal integerSHM segment total size in bytes. Optional.

Response / log / error metadata

Key Value Description
vgi_rpc.log_levelEXCEPTION, ERROR, WARN, INFO, DEBUG, TRACESeverity level. Present on log/error batches.
vgi_rpc.log_messageUTF-8 stringHuman-readable message text.
vgi_rpc.log_extraJSON stringAdditional structured data. Optional.
vgi_rpc.server_id12-char hexServer instance identifier.
vgi_rpc.request_idUTF-8 stringEchoed request correlation ID.

4. Type Mapping

RPC parameters and return values are serialized as Arrow columns. Cross-language implementations MUST use these Arrow types for interoperability.

Abstract Type Arrow Type Notes
stringutf8UTF-8 encoded
bytes / binarybinaryRaw byte sequence
int / integerint6464-bit signed integer
float / doublefloat64IEEE 754 double precision
boolbool
list[T]list(T)Recursive
dict[K, V]map(K, V)Serialized as (key, value) tuples
set[T]list(T)Order undefined
enumdictionary(int16, utf8)Serialized as member name
optional[T]T (nullable=true)null = absent value
dataclassbinarySerialized as IPC stream

5. Request Batch Format

Every RPC request is a single IPC stream containing exactly one batch with one row:

IPC Stream:
  Schema message:
    - One field per method parameter
    - Field types per the type mapping
  RecordBatch message:
    - Exactly 1 row
    - custom_metadata:
        vgi_rpc.method = "<method_name>"       (REQUIRED)
        vgi_rpc.request_version = "1"          (REQUIRED)
  EOS marker

For methods with no parameters, the schema has zero fields and the batch has one row with zero columns.

6. Response Format (Unary)

A unary response is a single IPC stream on the result schema:

IPC Stream:
  Schema message:
    - Single field named "result" (or empty for void)
  0..N log batches (zero-row, with log metadata)
  1 result or error batch:
    - Result: 1-row batch with return value
    - Void: 0-row batch on empty schema
    - Error: 0-row batch with EXCEPTION metadata
  EOS marker

Log batches MUST appear before the result/error batch.

7. Batch Classification Algorithm

receive(batch, custom_metadata):
  IF custom_metadata is NULL         → DATA batch
  IF batch.num_rows > 0              → DATA batch

  // Zero rows, metadata exists:
  IF has "vgi_rpc.log_level" AND "vgi_rpc.log_message":
    IF level == "EXCEPTION"          → ERROR batch → raise RpcError
    ELSE                             → LOG batch → on_log callback

  IF has "vgi_rpc.location"          → EXTERNAL POINTER batch
  IF has "vgi_rpc.shm_offset"        → SHM POINTER batch
  IF has "vgi_rpc.stream_state"      → STATE TOKEN batch

  → DATA batch (void return, stream-finish)

8. Log & Error Batch Format

A log batch is a zero-row batch on the response stream's schema with vgi_rpc.log_level and vgi_rpc.log_message in custom metadata.

When log_level is "EXCEPTION", the client MUST raise/throw an error with:

  • error_type: from log_extra.exception_type
  • error_message: from vgi_rpc.log_message
  • remote_traceback: from log_extra.traceback
  • request_id: from vgi_rpc.request_id

9. Stream Protocol (Pipe / Subprocess)

Streaming methods use a multi-phase exchange over a bidirectional byte stream.

Phase 1: Request parameters

Identical to a unary request: IPC stream with params schema, 1 request row, EOS.

Phase 1.5: Optional header stream

When the stream method declares a header type, the server sends a header IPC stream (header schema, log batches, 1 header row, EOS) before the main data exchange.

Phase 2: Lockstep data exchange

Producer:                        Exchange:
  Client    →   tick (0-row)       Client    →   input batch
  Server    ←   log* + data        Server    ←   log* + output
  Client    →   tick (0-row)       Client    →   input batch
  Server    ←   log* + data        Server    ←   log* + output
  Client    →   tick               Client    →   [EOS]
  Server    ←   [EOS] (finish)     Server    ←   [EOS]

10. HTTP Transport

The HTTP transport maps the pipe-based protocol to stateless HTTP request/response pairs. All requests use Content-Type: application/vnd.apache.arrow.stream.

Endpoint Method Description
{prefix}/{method}POSTUnary RPC call
{prefix}/{method}/initPOSTStream initialization
{prefix}/{method}/exchangePOSTStream continuation
{prefix}/__describe__POSTIntrospection
{prefix}/__capabilities__OPTIONSServer capability discovery

Streaming state is serialized into HMAC-signed tokens passed in Arrow batch metadata. Tokens include a configurable TTL (default 1 hour).

11. Shared Memory (SHM) Transport

The shared memory side-channel enables zero-copy batch transfer between co-located processes. The pipe carries control messages; large batches are written to shared memory and replaced with pointer batches.

Segment header (64 KiB):
  Offset  Size    Field
  0       4       magic: "VGIS" (0x56 0x47 0x49 0x53)
  4       4       version: uint32 = 1
  8       8       data_size: uint64
  16      4       num_allocs: uint32
  20      4       padding: uint32 = 0
  24      N*16    allocations: (offset, length) pairs

SHM pointer batches are zero-row batches with vgi_rpc.shm_offset and vgi_rpc.shm_length in custom metadata. The allocator uses first-fit with implicit coalescing. Maximum 4,094 allocations.

12. External Storage Pointer Batches

When batches exceed a configurable size threshold, they can be externalized to remote storage (S3, GCS) and replaced with pointer batches containing a vgi_rpc.location URL.

Resolution: fetch URL, optionally decompress (zstd), open as IPC stream, dispatch log batches, extract data batch. Supports retries (max 3 attempts) and redirect-loop detection.

13. Version Negotiation & Error Handling

Every request batch MUST carry vgi_rpc.request_version = "1".

Condition HTTP Status
Bad IPC, missing metadata, version mismatch400
Authentication failure401
Unknown method404
Wrong Content-Type415
Server implementation error500

14. Introspection (__describe__)

The __describe__ method is a built-in synthetic unary method that returns machine-readable metadata about all methods exposed by the server.

The response contains one row per method with columns including name, method_type, doc, params_schema_ipc, result_schema_ipc, and more.

See the full documentation for complete introspection response schema details.