STREAM PHASE2 IMPLEMENTATION PLAN
Stream Phase 2 Implementation Plan
This document defines the next architectural step for vhttpd stream mode.
Today, long-lived stream responses still behave much closer to a connection-hosted model:
vhttpdopens a worker connection- worker emits
start/chunk/error/end - the worker stays tied to that stream until completion
That model works, but it keeps worker occupancy proportional to stream lifetime. For AI token streaming, SSE feeds, and long text streams, that is the same scaling problem WebSocket phase 1 had.
WebSocket phase 2 already proved a better model:
- connection stays in
vhttpd - worker handles short-lived events
- worker count and connection count become decoupled
Stream phase 2 applies the same principle to HTTP streaming.
Goals
- keep long-lived stream sockets in
vhttpd - make stream generation worker-driven but not worker-owned
- support SSE and text chunk streams first
- keep
VSlim\Stream\ResponseandVPhp\VHttpd\PhpWorker\StreamResponseuserland APIs stable where possible - make AI token streaming scale without one worker per live client
Non-goals
- replacing the existing stream mode immediately
- adding durable queues or message persistence
- multi-node stream fanout
- bidirectional transport semantics
Current problem
In the current stream pipeline:
client
-> vhttpd
-> php-worker
-> app returns StreamResponse
-> worker emits stream frames over one long-lived worker socket
-> vhttpd forwards chunks until end
That means a long stream still occupies a worker for its full duration.
This is acceptable for MVP, but not for the model we want long-term.
Target model
Stream phase 2 should move to a pull-style dispatch model:
client
-> vhttpd stream connection
-> stream state lives in vhttpd
-> vhttpd dispatches short-lived stream events to a worker
-> worker returns stream commands/chunks
-> vhttpd writes chunks to client
-> repeat until end
The worker no longer owns the stream socket.
Current MVP note:
VPhp\VHttpd\PhpWorker\StreamApp::fromSequence(...)can turn a finite chunk/event sequence into a replayable phase-2 stream.VPhp\VHttpd\PhpWorker\StreamApp::fromStreamResponse(...)can adapt a finiteVPhp\VSlim\Stream\Responseinto the sameopen / next / closeloop.VPhp\VSlim\Stream\Factory::dispatchSse(...),dispatchText(...), anddispatchResponse(...)are the preferred high-level builders for package users.- This is intentionally aimed at synthetic or fully materializable streams first. Live upstream handles such as Ollama sockets still need a later phase-2 specific adapter.
Why stream is different from websocket
WebSocket is message/event driven by the client. HTTP streaming is usually producer-driven by the server.
That means phase 2 stream mode should likely be pull-based, not event-push duplex.
The most natural shape is:
stream.open- repeated
stream.next - optional
stream.close
vhttpd drives those steps.
Proposed stream lifecycle
1. Open
vhttpd receives a normal HTTP request.
Instead of binding the worker for the entire stream, it sends:
{
"mode": "stream",
"strategy": "dispatch",
"event": "open",
"id": "req-123",
"method": "GET",
"path": "/ollama/sse",
"query": {"prompt": "hello"},
"headers": {"accept": "text/event-stream"},
"body": "",
"state": {}
}
Worker returns:
- stream type
- headers
- initial stream state token/snapshot
- optional first batch of chunks/events
- whether stream is done
2. Next
If stream is not done, vhttpd periodically or immediately dispatches:
{
"mode": "stream",
"strategy": "dispatch",
"event": "next",
"id": "req-123",
"state": {
"...": "opaque worker state"
}
}
Worker returns:
- updated state
- next batch of chunks/events
- done flag
3. Close
If client disconnects or stream completes, vhttpd may send:
{
"mode": "stream",
"strategy": "dispatch",
"event": "close",
"id": "req-123",
"state": {...},
"reason": "client_disconnect"
}
This allows cleanup in userland if needed.
Core design decision: state token
Phase 2 stream mode needs one of these:
- opaque state snapshot returned by worker and sent back on every
next - worker-side stream id registry
vhttpd-side stream state machine with explicit commands
I recommend option 1 first:
- worker returns a serializable state payload
vhttpdstores it per request- each
nextcall sends it back
Why:
- easiest to reason about
- avoids long-lived worker memory ownership
- works across any worker process
- fits the same stateless worker philosophy as websocket phase 2
Stream request/response shapes
Dispatch request
{
"mode": "stream",
"strategy": "dispatch",
"event": "next",
"id": "req-123",
"state": {
"cursor": 12,
"upstream": {
"kind": "ollama_ndjson",
"buffer": ""
}
}
}
Dispatch response
{
"mode": "stream",
"strategy": "dispatch",
"event": "result",
"id": "req-123",
"stream_type": "sse",
"content_type": "text/event-stream",
"headers": {
"cache-control": "no-cache"
},
"state": {
"cursor": 13
},
"chunks": [
{
"event": "chunk",
"data": "token text"
}
],
"done": false
}
For SSE, each chunk can carry:
eventiddataretry
For text streams, each chunk can carry:
data
Suggested V data structures
struct StreamDispatchRequest {
mode string
event string // open|next|close
id string
method string
path string
query map[string]string
headers map[string]string
body string
remote_addr string
request_id string
trace_id string
state map[string]string
reason string
}
struct StreamDispatchChunk {
event string
id string
data string
retry int
}
struct StreamDispatchResponse {
mode string
event string
id string
stream_type string
content_type string
headers map[string]string
state map[string]string
chunks []StreamDispatchChunk
done bool
error string
error_class string
}
For the MVP, state can stay map[string]string.
If that becomes too tight, move to map[string]json.Any later.
vhttpd responsibilities
vhttpd should own:
- live client stream connection
- response headers / content type
- per-request dispatch state
- client disconnect detection
- write loop
- pacing / backoff for
next
New helpers likely needed:
fn dispatch_stream(mut app App, req StreamDispatchRequest) !StreamDispatchResponse
fn stream_open(...)
fn stream_next(...)
fn stream_close(...)
php-worker responsibilities
Add a new branch:
if (($req['mode'] ?? '') === 'stream' && ($req['strategy'] ?? '') === 'dispatch') {
return $this->handleStream($req);
}
That branch should:
- load app
- resolve a stream-dispatch-capable handler
- call open/next/close
- return state + chunk batch
PHP API strategy
There are two reasonable options.
Option A: add new dispatch-specific stream app
Example:
$stream = new VPhp\VHttpd\PhpWorker\StreamApp(
open: function (array $req): array { ... },
next: function (array $state): array { ... },
close: function (array $state): void { ... },
);
Pros:
- explicit phase-2 shape
- does not overload existing
StreamResponse
Cons:
- new userland API
Option B: keep StreamResponse, but make some factories phase-2 aware
This is more attractive long-term, but trickier immediately.
For MVP, I recommend Option A first.
Once it is stable, we can layer StreamResponse or VSlim\Stream\Factory on top.
VSlim implications
VSlim already has:
VSlim\Stream\ResponseVSlim\Stream\Factory
But these are still phase-1 friendly.
For stream phase 2, VSlim likely needs a parallel API first, for example:
VSlim\Stream\Dispatch\AppVSlim\Stream\Dispatch\OllamaSession
or a compact helper:
return VSlim\Stream\Factory::dispatch_sse(...);
I would still avoid forcing this into the existing Response abstraction too early.
Best first use case
The strongest first target is:
- Ollama SSE/text streaming
Why:
- already implemented in phase 1
- clearly long-lived
- usually token-by-token
- easy to compare old and new behavior
The first phase-2 stream MVP does not need to support every stream source.
It only needs to prove:
- one long SSE connection no longer occupies one worker
- one worker can serve multiple concurrent stream clients by handling short-lived
nextcalls
Scheduling model for next
There are two choices:
Immediate pull loop
After each response, vhttpd immediately asks for next again until:
- worker returns no chunks and no done
- or a small backoff is needed
Timed polling
vhttpd schedules next with a small delay.
For MVP, I recommend:
- immediate pull when chunks were returned
- short backoff when chunk batch is empty and stream is not done
That keeps implementation simple while avoiding a busy loop.
Failure model
If open fails:
- return normal HTTP error response
If next fails:
- emit stream error to event log
- terminate client stream
If client disconnects:
- best-effort
closedispatch - clean local stream state
Rollout plan
Step 1
Add mode=stream plus strategy=dispatch to php-worker and vhttpd transport.
Step 2
Implement one internal demo source:
- synthetic SSE counter stream
Step 3
Implement Ollama phase-2 adapter.
Step 4
Expose a VSlim example:
stream_app.php
Step 5
Compare:
- phase-1 stream mode
- phase-2 stream dispatch mode
especially under:
- one worker
- multiple simultaneous SSE clients
First MVP slice
The minimum useful stream phase-2 MVP is:
- SSE only
open,next,close- string-keyed state map
- one simple synthetic stream app
- one Ollama-backed demo after the synthetic path works
If that works, then the same decoupling principle is proven for stream mode too:
- long-lived connections stay in
vhttpd - PHP workers are used for short-lived work, not connection occupancy