UPSTREAM PLAN PHASE3

📅 2026/3/13 ✍️ Bullsoft

Upstream Plan (Phase 3)

This document defines the next step after the stream dispatch strategy.

Phase 2 already decouples the client stream connection from the PHP worker:

  • client connection stays in vhttpd
  • worker only handles short-lived open / next / close

That is enough for synthetic streams and replayable finite sequences.

It is not enough for live upstream streams such as Ollama, because the PHP worker would still have to keep an upstream socket resource open.

Problem

If a PHP worker opens an Ollama stream itself:

  • it owns a live upstream socket
  • that socket is not serializable into state
  • a later next call might hit a different worker

So the worker would still be effectively locked by the upstream stream.

Direction

The next architecture step is:

  • PHP builds a generic Upstream\Plan
  • vhttpd owns the upstream connection
  • vhttpd decodes upstream chunks
  • vhttpd maps upstream chunks into downstream SSE/text output

That keeps both sides decoupled:

  • browser/client connection stays in vhttpd
  • upstream AI stream also stays in vhttpd
  • PHP worker becomes a short-lived planner

Plan object

Package-side class:

  • VPhp\VHttpd\Upstream\Plan

Current purpose:

  • describe a generic upstream stream request
  • stay transport-oriented, not provider-specific
  • let Ollama be the first adapter, not a special case in vhttpd

Current fields include:

  • transport
  • url
  • method
  • request_headers
  • body
  • codec
  • mapper
  • output_stream_type
  • output_content_type
  • response_headers
  • fixture_path
  • name
  • meta

Current MVP schema:

{
  "transport": "http",
  "url": "http://127.0.0.1:11434/api/chat",
  "method": "POST",
  "request_headers": {
    "content-type": "application/json",
    "accept": "application/x-ndjson"
  },
  "body": "{\"model\":\"...\",\"stream\":true,\"messages\":[...]}",
  "codec": "ndjson",
  "mapper": "ndjson_text_field",
  "output_stream_type": "text",
  "output_content_type": "text/plain; charset=utf-8",
  "response_headers": {
    "x-ollama-model": "qwen2.5:7b-instruct"
  },
  "fixture_path": "",
  "name": "ollama_chat",
  "meta": {
    "field_path": "message.content",
    "fallback_field_path": "response",
    "sse_event": "token"
  }
}

Field notes:

  • transport
    • current MVP only supports http
  • codec
    • current MVP only supports ndjson
  • mapper
    • current MVP supports ndjson_text_field and ndjson_sse_field
  • meta.field_path
    • primary field to read from each decoded NDJSON row
  • meta.fallback_field_path
    • optional fallback field when the primary field is empty
  • meta.sse_event
    • event name used when output mode is SSE
  • fixture_path
    • when non-empty, vhttpd reads a local deterministic fixture instead of opening a live upstream connection

Ollama as first adapter

Package-side helper entrypoints now exist:

  • VPhp\VSlim\Stream\Factory::ollamaUpstreamTextPlan(...)
  • VPhp\VSlim\Stream\Factory::ollamaUpstreamSsePlan(...)
  • VPhp\VSlim\Stream\Factory::ollamaUpstreamPlan(...)

These now feed a real phase-3 executor in vhttpd.

Current MVP supports:

  • transport = http
  • codec = ndjson
  • mapper = ndjson_text_field
  • mapper = ndjson_sse_field
  • meta.field_path = message.content
  • meta.fallback_field_path = response
  • meta.sse_event = token
  • local fixture_path for deterministic tests
  • live upstream HTTP streaming via net.http progress callbacks

The current contract is:

  • request goes to POST /api/chat
  • request body uses stream: true
  • upstream codec is ndjson
  • mapper is ndjson_text_field or ndjson_sse_field

Why generic plan instead of Ollama-specific runtime code

This keeps vhttpd reusable.

The runtime should understand:

  • upstream transport
  • upstream codec
  • downstream mapper/output mode

It should not hardcode:

  • provider URLs
  • Ollama-only request semantics everywhere

That way the same mechanism can later support:

  • Ollama-compatible endpoints
  • OpenAI-compatible NDJSON/SSE variants
  • other streaming upstreams

Current MVP result

vhttpd now accepts a worker result that returns an upstream plan, opens the upstream stream itself, decodes NDJSON, and maps it into downstream text or sse output.

This is the first version where:

  • browser stream does not lock a worker
  • upstream Ollama stream also does not lock a worker

The validated fixture output is:

  • /ollama/text -> Hello from VSlim
  • /ollama/sse -> token events followed by done

Error model in current MVP

Current MVP behavior is now:

  • invalid plan contract (transport/codec/mapper) -> direct 502 Bad Gateway
  • upstream failure before the first downstream chunk:
    • text -> direct 502 with a plain-text body
    • sse -> direct 502, then error event, then final done
  • upstream failure after downstream streaming has already started:
    • text -> append a plain-text error tail and terminate chunked response
    • sse -> emit error, then final done
  • worker planning/runtime failures still surface through the normal worker error model

This keeps one simple rule:

  • before the first downstream bytes, vhttpd can still surface a real HTTP error status
  • after downstream streaming has started, errors become stream-level events/tails