HTTP Streaming

HTTP streaming is the incremental delivery of HTTP response data to a client as the server produces each piece, rather than waiting for the entire response to be ready before sending. Streaming reduces time-to-first-byte, keeps memory usage low on both sides, and enables real-time data flows, from live dashboards and chat feeds to server-side rendered HTML arriving in progressive chunks.

Baseline: Widely available

Server-Sent Events and the Web Streams API are supported in all major browsers. SSE · Streams

Usage

Traditional HTTP responses are buffered: the server generates the full body, sets Content-Length, and sends everything at once. Streaming flips this model. The server starts sending as soon as the first bytes are available and continues pushing data until the response completes.

This pattern applies at multiple levels. At the transport layer, HTTP/1.1 uses chunked transfer encoding to stream bytes over a single connection. HTTP/2 and HTTP/3 stream through DATA frames on multiplexed streams. At the application layer, Server-Sent Events (SSE) provide a standardized format for pushing events from server to client. Frameworks like React use streaming to deliver server-rendered HTML progressively, sending the page shell first and filling in dynamic content as data becomes available.

Not media streaming

HTTP streaming refers to the incremental delivery of response data over HTTP. This is distinct from media streaming protocols like HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP), which deliver pre-segmented audio and video files using standard HTTP downloads of discrete media segments. HLS and DASH do not stream a single continuous HTTP response. They issue separate requests for each media segment.

Chunked transfer encoding

In HTTP/1.1, the primary mechanism for streaming is chunked transfer encoding. The server omits Content-Length and instead sends the body as a series of chunks, each prefixed with a hexadecimal size. A zero-length chunk signals the end of the response.

HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked

1a
{"status":"processing"}

22
{"progress":50,"item":"report"}

0

Each chunk arrives as soon as the server produces the data. The client processes chunks incrementally without knowing the total response size in advance. This is how long-running API endpoints, log tailing, and streaming exports work over HTTP/1.1.

Chunked encoding also supports trailer fields, headers sent after the body completes. A Content-Digest trailer, for example, provides a hash computed over the entire streamed body.

HTTP/2 and HTTP/3 streaming

HTTP/2 and HTTP/3 replace chunked encoding with native frame-based streaming. The response body is sent as a sequence of DATA frames on a single stream. Each frame carries a length prefix and arrives as the server generates the content.

Multiplexing allows multiple streams on one connection, so a slow streaming response on one stream does not block other responses. HTTP/3 runs over QUIC, which eliminates head-of-line blocking at the transport layer. A lost packet on one stream does not stall other streams.

The END_STREAM flag on the final DATA or HEADERS frame signals the end of the response, replacing the zero-length chunk from HTTP/1.1.

Server push is deprecated

HTTP/2 server push (PUSH_PROMISE) was a separate mechanism allowing servers to proactively send resources before the client requested them. Most browsers have disabled server push support, and HTTP/3 implementations largely do not use the feature. Streaming and server push are unrelated concepts.

Server-Sent Events

Server-Sent Events (SSE) provide a standardized protocol for streaming text-based events from server to client over a long-lived HTTP connection. The response uses text/event-stream as the Content-Type and follows a line-based format defined in the WHATWG HTML Standard (§9.2).

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

data: {"temperature":22.5,"unit":"celsius"}

event: alert
data: {"level":"warning","message":"High wind"}

id: 42
data: {"temperature":23.1,"unit":"celsius"}

Each event is a block of field-value lines separated by a blank line. The supported fields are:

  • data: the event payload (multiple data: lines are concatenated with newlines)
  • event: a named event type (defaults to message when omitted)
  • id: a unique identifier enabling automatic reconnection from where the stream left off
  • retry: a reconnection interval in milliseconds

EventSource API

Browsers expose SSE through the EventSource interface. The API handles connection management, automatic reconnection with the Last-Event-ID header, and event dispatching.

const source = new EventSource(
  "https://api.example.re/events"
);

source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log(data.temperature);
};

source.addEventListener("alert", (event) => {
  const alert = JSON.parse(event.data);
  console.log(alert.message);
});

source.onerror = () => {
  console.log("Connection lost, reconnecting...");
};

When the connection drops, EventSource automatically reconnects and sends the last received id value in the Last-Event-ID Request header. The server uses this to resume the stream from the correct position.

SSE is unidirectional: data flows from server to client only. For bidirectional communication, WebSocket is the appropriate protocol.

SSE connection limits

Browsers enforce a per-domain limit on simultaneous SSE connections (typically six in HTTP/1.1). HTTP/2 and HTTP/3 multiplex SSE streams on a single connection, effectively removing this constraint.

Streaming in applications

Streaming server-side rendering

Modern frameworks use HTTP streaming to deliver server-rendered HTML progressively. Instead of waiting for all data fetches to complete before sending any HTML, the server streams the page shell immediately and fills in dynamic sections as data arrives.

React's renderToPipeableStream API is the primary example. The server sends the HTML document head, navigation, and layout as soon as rendering starts. Components wrapped in <Suspense> boundaries display a fallback (a loading spinner or skeleton) in the initial HTML. When the data for a suspended component resolves, React streams an inline <script> tag to the client replacing the fallback with the final content, requiring no additional requests or client-side JavaScript execution needed for the swap.

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked

<!-- initial shell arrives immediately -->
<!doctype html>
<html>
<head><title>Dashboard</title></head>
<body>
  <nav>...</nav>
  <main>
    <!--$?-->
    <template id="B:0"></template>
    <div class="skeleton">Loading...</div>
    <!--/$-->
  </main>

<!-- later, when data resolves -->
<div hidden id="S:0">
  <div class="stats">Revenue: $1.2M</div>
</div>
<script>
  // React swaps the Suspense fallback
  $RC("B:0","S:0")
</script>
</body>
</html>

The browser renders the shell and skeleton instantly. Seconds later, the streamed script replaces the skeleton with the real data. The user sees a fast initial paint followed by progressive content population, all from a single HTTP response.

This approach sits between full client-side rendering (CSR) and traditional server-side rendering (SSR). CSR sends an empty HTML shell and fetches all data on the client. Traditional SSR waits for all data before sending any HTML. Streaming SSR sends HTML as pieces become ready, combining fast time-to-first-byte with full server-rendered content.

NDJSON streaming

Newline-Delimited JSON (NDJSON) streams a sequence of JSON objects separated by newline characters (\n). Each line is a complete, self-contained JSON value. The Content-Type is application/x-ndjson.

HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked

{"id":1,"status":"indexed","url":"/about"}
{"id":2,"status":"crawled","url":"/products"}
{"id":3,"status":"error","url":"/old-page"}

NDJSON is common in log streaming, data export APIs, and monitoring feeds. Each line is parsed independently, so the client processes records as they arrive without buffering the entire response.

AI and LLM response streaming

Large language model APIs stream responses token-by-token using SSE. The server sends each generated token as an SSE event, allowing the client to display text as the model produces the output rather than waiting for the complete response.

HTTP/1.1 200 OK
Content-Type: text/event-stream

data: {"token":"HTTP"}

data: {"token":" streaming"}

data: {"token":" delivers"}

data: {"token":" data"}

data: {"token":" incrementally"}

data: [DONE]

This streaming pattern is standard across AI API providers and accounts for a significant share of modern SSE traffic.

Web Streams API

The Web Streams API (WHATWG Streams Standard) provides JavaScript primitives for consuming and transforming streamed data in the browser. The fetch() API returns a ReadableStream on the response body, enabling chunk-by-chunk processing of HTTP responses.

const response = await fetch(
  "https://api.example.re/export"
);
const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value, {
    stream: true
  });
  processChunk(text);
}

The API defines three stream types:

  • ReadableStream: a source of data (the response.body from fetch())
  • WritableStream: a destination for data
  • TransformStream: a transform applied between a readable and writable stream (decompression, parsing, filtering)

Streams are composable through pipeThrough() and pipeTo(). A response body piped through a TextDecoderStream and then a custom NDJSON parser creates a pipeline processing JSON objects as they arrive over the network.

const response = await fetch(
  "https://api.example.re/stream"
);

const lines = response.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(splitByNewline());

const reader = lines.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const record = JSON.parse(value);
  renderRow(record);
}

Backpressure

The Web Streams API supports backpressure. When the consumer processes data slower than the producer sends, the stream automatically signals the source to slow down. This prevents memory buildup from unbounded buffering.

Comparison with other patterns

Pattern Direction Protocol Connection
HTTP streaming Server → client HTTP Single response
SSE Server → client HTTP Long-lived
WebSocket Bidirectional WS/WSS Persistent
Long polling Server → client HTTP Repeated requests
HLS/DASH Server → client HTTP Segment requests

HTTP streaming sends a single response incrementally. The connection closes when the response completes.

SSE is a specific form of HTTP streaming with a standardized event format, automatic reconnection, and browser API support.

WebSocket provides full-duplex communication, where both client and server send messages at any time on the same connection. WebSocket is appropriate for interactive applications like chat, gaming, and collaborative editing.

Long polling simulates streaming by having the client send a Request, the server hold the connection until data is available, send a response, and the client immediately send a new request. This creates a series of short-lived connections rather than a true stream.

HLS and DASH are media delivery protocols where the client downloads a manifest file listing media segment URLs, then fetches each segment as a separate HTTP Request. The client controls playback and adapts quality based on network conditions. These are not HTTP streaming in the protocol sense.

Example

A complete streaming exchange. The server streams search results as NDJSON over chunked encoding.

Request

GET /api/search?q=http+streaming HTTP/1.1
Host: api.example.re
Accept: application/x-ndjson

Response (streamed)

HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
Cache-Control: no-cache

{"id":1,"title":"HTTP Streaming Guide","score":0.98}
{"id":2,"title":"Server-Sent Events","score":0.91}
{"id":3,"title":"Chunked Encoding","score":0.87}

Each JSON line arrives as the search engine scores and ranks the result. The client renders each result immediately without waiting for the full result set.

An SSE stream delivering live metrics:

Request

GET /metrics/live HTTP/1.1
Host: api.example.re
Accept: text/event-stream

Response (streamed)

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache

event: cpu
data: {"usage":42,"cores":8}

event: memory
data: {"used_gb":12.4,"total_gb":32}

event: cpu
data: {"usage":38,"cores":8}

The EventSource API in the browser reconnects automatically if the connection drops, resuming from the last received event ID.

Takeaway

HTTP streaming delivers response data incrementally, through chunked transfer encoding in HTTP/1.1, DATA frames in HTTP/2 and HTTP/3, or the SSE protocol for event streams. The Web Streams API provides browser-side primitives for consuming streamed data chunk by chunk. From streaming server-rendered HTML to real-time dashboards and AI model outputs, HTTP streaming reduces latency, lowers memory usage, and enables real-time data delivery without leaving the HTTP protocol.

See also

Last updated: March 6, 2026