Wire Protocol Specification
Version 1 — Normative — For cross-language implementors
1. Overview & Conventions
vgi-rpc is an RPC framework where serialization uses the Apache Arrow IPC Streaming Format.
- All integers are little-endian unless stated otherwise
- All metadata strings are UTF-8 encoded
- Wire protocol version:
"1"(ASCII0x31) - Metadata keys in the
vgi_rpc.*namespace are framework-reserved
| Term | Definition |
|---|---|
| IPC stream | A complete Arrow IPC streaming-format sequence: schema + record batches + EOS marker |
| Batch | An Arrow RecordBatch — zero or more rows conforming to a schema |
| Custom metadata | Per-batch KeyValueMetadata (distinct from schema-level metadata) |
| Zero-row batch | A batch with num_rows == 0. Used for log, error, pointer, and completion signals |
| Data batch | A batch with num_rows > 0, or a zero-row batch without log/error metadata |
2. Arrow IPC Framing
Each logical message exchange uses one or more IPC streams written sequentially on the same byte stream (pipe, TCP socket, HTTP body, etc.).
An IPC stream consists of:
- Schema message — describes the columns and their Arrow types
- Zero or more RecordBatch messages — each optionally carrying per-batch custom metadata
- EOS marker — the 8-byte sequence
0xFF 0xFF 0xFF 0xFF 0x00 0x00 0x00 0x00
Multiple IPC streams are written sequentially on the same underlying byte stream. Each reader opens one stream, reads until EOS, and stops. The next reader picks up immediately after.
3. Metadata Key Reference
Request metadata
| Key | Value | Description |
|---|---|---|
| vgi_rpc.method | UTF-8 method name | Target RPC method. Required. |
| vgi_rpc.request_version | "1" | Wire protocol version. Required. |
| vgi_rpc.request_id | 16-char hex | Per-request correlation ID. Optional. |
| traceparent | W3C Trace Context | OpenTelemetry trace propagation. Optional. |
| vgi_rpc.shm_segment_name | UTF-8 OS name | Shared memory segment name. Optional. |
| vgi_rpc.shm_segment_size | Decimal integer | SHM segment total size in bytes. Optional. |
Response / log / error metadata
| Key | Value | Description |
|---|---|---|
| vgi_rpc.log_level | EXCEPTION, ERROR, WARN, INFO, DEBUG, TRACE | Severity level. Present on log/error batches. |
| vgi_rpc.log_message | UTF-8 string | Human-readable message text. |
| vgi_rpc.log_extra | JSON string | Additional structured data. Optional. |
| vgi_rpc.server_id | 12-char hex | Server instance identifier. |
| vgi_rpc.request_id | UTF-8 string | Echoed request correlation ID. |
4. Type Mapping
RPC parameters and return values are serialized as Arrow columns. Cross-language implementations MUST use these Arrow types for interoperability.
| Abstract Type | Arrow Type | Notes |
|---|---|---|
| string | utf8 | UTF-8 encoded |
| bytes / binary | binary | Raw byte sequence |
| int / integer | int64 | 64-bit signed integer |
| float / double | float64 | IEEE 754 double precision |
| bool | bool | — |
| list[T] | list(T) | Recursive |
| dict[K, V] | map(K, V) | Serialized as (key, value) tuples |
| set[T] | list(T) | Order undefined |
| enum | dictionary(int16, utf8) | Serialized as member name |
| optional[T] | T (nullable=true) | null = absent value |
| dataclass | binary | Serialized as IPC stream |
5. Request Batch Format
Every RPC request is a single IPC stream containing exactly one batch with one row:
IPC Stream:
Schema message:
- One field per method parameter
- Field types per the type mapping
RecordBatch message:
- Exactly 1 row
- custom_metadata:
vgi_rpc.method = "<method_name>" (REQUIRED)
vgi_rpc.request_version = "1" (REQUIRED)
EOS marker For methods with no parameters, the schema has zero fields and the batch has one row with zero columns.
6. Response Format (Unary)
A unary response is a single IPC stream on the result schema:
IPC Stream:
Schema message:
- Single field named "result" (or empty for void)
0..N log batches (zero-row, with log metadata)
1 result or error batch:
- Result: 1-row batch with return value
- Void: 0-row batch on empty schema
- Error: 0-row batch with EXCEPTION metadata
EOS marker Log batches MUST appear before the result/error batch.
7. Batch Classification Algorithm
receive(batch, custom_metadata):
IF custom_metadata is NULL → DATA batch
IF batch.num_rows > 0 → DATA batch
// Zero rows, metadata exists:
IF has "vgi_rpc.log_level" AND "vgi_rpc.log_message":
IF level == "EXCEPTION" → ERROR batch → raise RpcError
ELSE → LOG batch → on_log callback
IF has "vgi_rpc.location" → EXTERNAL POINTER batch
IF has "vgi_rpc.shm_offset" → SHM POINTER batch
IF has "vgi_rpc.stream_state" → STATE TOKEN batch
→ DATA batch (void return, stream-finish) 8. Log & Error Batch Format
A log batch is a zero-row batch on the response stream's schema with
vgi_rpc.log_level and
vgi_rpc.log_message in custom metadata.
When log_level is
"EXCEPTION",
the client MUST raise/throw an error with:
- error_type: from
log_extra.exception_type - error_message: from
vgi_rpc.log_message - remote_traceback: from
log_extra.traceback - request_id: from
vgi_rpc.request_id
9. Stream Protocol (Pipe / Subprocess)
Streaming methods use a multi-phase exchange over a bidirectional byte stream.
Phase 1: Request parameters
Identical to a unary request: IPC stream with params schema, 1 request row, EOS.
Phase 1.5: Optional header stream
When the stream method declares a header type, the server sends a header IPC stream (header schema, log batches, 1 header row, EOS) before the main data exchange.
Phase 2: Lockstep data exchange
Producer: Exchange: Client → tick (0-row) Client → input batch Server ← log* + data Server ← log* + output Client → tick (0-row) Client → input batch Server ← log* + data Server ← log* + output Client → tick Client → [EOS] Server ← [EOS] (finish) Server ← [EOS]
10. HTTP Transport
The HTTP transport maps the pipe-based protocol to stateless HTTP request/response pairs.
All requests use Content-Type: application/vnd.apache.arrow.stream.
| Endpoint | Method | Description |
|---|---|---|
| {prefix}/{method} | POST | Unary RPC call |
| {prefix}/{method}/init | POST | Stream initialization |
| {prefix}/{method}/exchange | POST | Stream continuation |
| {prefix}/__describe__ | POST | Introspection |
| {prefix}/__capabilities__ | OPTIONS | Server capability discovery |
Streaming state is serialized into HMAC-signed tokens passed in Arrow batch metadata. Tokens include a configurable TTL (default 1 hour).
11. Shared Memory (SHM) Transport
The shared memory side-channel enables zero-copy batch transfer between co-located processes. The pipe carries control messages; large batches are written to shared memory and replaced with pointer batches.
Segment header (64 KiB): Offset Size Field 0 4 magic: "VGIS" (0x56 0x47 0x49 0x53) 4 4 version: uint32 = 1 8 8 data_size: uint64 16 4 num_allocs: uint32 20 4 padding: uint32 = 0 24 N*16 allocations: (offset, length) pairs
SHM pointer batches are zero-row batches with vgi_rpc.shm_offset
and vgi_rpc.shm_length in custom metadata.
The allocator uses first-fit with implicit coalescing. Maximum 4,094 allocations.
12. External Storage Pointer Batches
When batches exceed a configurable size threshold, they can be externalized to remote storage
(S3, GCS) and replaced with pointer batches containing a
vgi_rpc.location URL.
Resolution: fetch URL, optionally decompress (zstd), open as IPC stream, dispatch log batches, extract data batch. Supports retries (max 3 attempts) and redirect-loop detection.
13. Version Negotiation & Error Handling
Every request batch MUST carry
vgi_rpc.request_version = "1".
| Condition | HTTP Status |
|---|---|
| Bad IPC, missing metadata, version mismatch | 400 |
| Authentication failure | 401 |
| Unknown method | 404 |
| Wrong Content-Type | 415 |
| Server implementation error | 500 |
14. Introspection (__describe__)
The __describe__
method is a built-in synthetic unary method that returns machine-readable metadata
about all methods exposed by the server.
The response contains one row per method with columns including
name,
method_type,
doc,
params_schema_ipc,
result_schema_ipc,
and more.
See the full documentation for complete introspection response schema details.