HTTP Streaming
HTTP streaming is the incremental delivery of HTTP response data to a client as the server produces each piece, rather than waiting for the entire response to be ready before sending. Streaming reduces time-to-first-byte, keeps memory usage low on both sides, and enables real-time data flows, from live dashboards and chat feeds to server-side rendered HTML arriving in progressive chunks.
Baseline: Widely available
Server-Sent Events and the Web Streams API are supported in all major browsers. SSE · Streams
Usage
Traditional HTTP responses are buffered: the server generates the full body, sets Content-Length, and sends everything at once. Streaming flips this model. The server starts sending as soon as the first bytes are available and continues pushing data until the response completes.
This pattern applies at multiple levels. At the transport layer, HTTP/1.1 uses chunked transfer encoding to stream bytes over a single connection. HTTP/2 and HTTP/3 stream through DATA frames on multiplexed streams. At the application layer, Server-Sent Events (SSE) provide a standardized format for pushing events from server to client. Frameworks like React use streaming to deliver server-rendered HTML progressively, sending the page shell first and filling in dynamic content as data becomes available.
Not media streaming
HTTP streaming refers to the incremental delivery of response data over HTTP. This is distinct from media streaming protocols like HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP), which deliver pre-segmented audio and video files using standard HTTP downloads of discrete media segments. HLS and DASH do not stream a single continuous HTTP response. They issue separate requests for each media segment.
Chunked transfer encoding
In HTTP/1.1, the primary mechanism for streaming is chunked transfer encoding. The server omits Content-Length and instead sends the body as a series of chunks, each prefixed with a hexadecimal size. A zero-length chunk signals the end of the response.
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
1a
{"status":"processing"}
22
{"progress":50,"item":"report"}
0
Each chunk arrives as soon as the server produces the data. The client processes chunks incrementally without knowing the total response size in advance. This is how long-running API endpoints, log tailing, and streaming exports work over HTTP/1.1.
Chunked encoding also supports trailer fields, headers sent after the body completes. A Content-Digest trailer, for example, provides a hash computed over the entire streamed body.
HTTP/2 and HTTP/3 streaming
HTTP/2 and HTTP/3 replace chunked encoding with native frame-based streaming. The response body is sent as a sequence of DATA frames on a single stream. Each frame carries a length prefix and arrives as the server generates the content.
Multiplexing allows multiple streams on one connection, so a slow streaming response on one stream does not block other responses. HTTP/3 runs over QUIC, which eliminates head-of-line blocking at the transport layer. A lost packet on one stream does not stall other streams.
The END_STREAM flag on the final DATA or HEADERS
frame signals the end of the response, replacing
the zero-length chunk from HTTP/1.1.
Server push is deprecated
HTTP/2 server push (PUSH_PROMISE) was a separate mechanism allowing servers to proactively send resources before the client requested them. Most browsers have disabled server push support, and HTTP/3 implementations largely do not use the feature. Streaming and server push are unrelated concepts.
Server-Sent Events
Server-Sent Events (SSE) provide a standardized
protocol for streaming text-based events from server
to client over a long-lived HTTP connection. The
response uses text/event-stream as the
Content-Type and follows a
line-based format defined in the WHATWG HTML
Standard (§9.2).
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
data: {"temperature":22.5,"unit":"celsius"}
event: alert
data: {"level":"warning","message":"High wind"}
id: 42
data: {"temperature":23.1,"unit":"celsius"}
Each event is a block of field-value lines separated by a blank line. The supported fields are:
- data: the event payload (multiple
data:lines are concatenated with newlines) - event: a named event type (defaults to
messagewhen omitted) - id: a unique identifier enabling automatic reconnection from where the stream left off
- retry: a reconnection interval in milliseconds
EventSource API
Browsers expose SSE through the EventSource
interface. The API handles connection management,
automatic reconnection with the Last-Event-ID
header, and event dispatching.
const source = new EventSource(
"https://api.example.re/events"
);
source.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.temperature);
};
source.addEventListener("alert", (event) => {
const alert = JSON.parse(event.data);
console.log(alert.message);
});
source.onerror = () => {
console.log("Connection lost, reconnecting...");
};
When the connection drops, EventSource
automatically reconnects and sends the last received
id value in the Last-Event-ID Request
header. The server uses this to resume the stream
from the correct position.
SSE is unidirectional: data flows from server to client only. For bidirectional communication, WebSocket is the appropriate protocol.
SSE connection limits
Browsers enforce a per-domain limit on simultaneous SSE connections (typically six in HTTP/1.1). HTTP/2 and HTTP/3 multiplex SSE streams on a single connection, effectively removing this constraint.
Streaming in applications
Streaming server-side rendering
Modern frameworks use HTTP streaming to deliver server-rendered HTML progressively. Instead of waiting for all data fetches to complete before sending any HTML, the server streams the page shell immediately and fills in dynamic sections as data arrives.
React's renderToPipeableStream API is the primary
example. The server sends the HTML document head,
navigation, and layout as soon as rendering starts.
Components wrapped in <Suspense> boundaries
display a fallback (a loading spinner or skeleton)
in the initial HTML. When the data for a suspended
component resolves, React streams an inline
<script> tag to the client replacing the fallback
with the final content, requiring no additional
requests or client-side JavaScript
execution needed for the swap.
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
<!-- initial shell arrives immediately -->
<!doctype html>
<html>
<head><title>Dashboard</title></head>
<body>
<nav>...</nav>
<main>
<!--$?-->
<template id="B:0"></template>
<div class="skeleton">Loading...</div>
<!--/$-->
</main>
<!-- later, when data resolves -->
<div hidden id="S:0">
<div class="stats">Revenue: $1.2M</div>
</div>
<script>
// React swaps the Suspense fallback
$RC("B:0","S:0")
</script>
</body>
</html>
The browser renders the shell and skeleton instantly. Seconds later, the streamed script replaces the skeleton with the real data. The user sees a fast initial paint followed by progressive content population, all from a single HTTP response.
This approach sits between full client-side rendering (CSR) and traditional server-side rendering (SSR). CSR sends an empty HTML shell and fetches all data on the client. Traditional SSR waits for all data before sending any HTML. Streaming SSR sends HTML as pieces become ready, combining fast time-to-first-byte with full server-rendered content.
NDJSON streaming
Newline-Delimited JSON (NDJSON) streams a sequence
of JSON objects separated by newline characters
(\n). Each line is a complete, self-contained
JSON value. The Content-Type is
application/x-ndjson.
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
{"id":1,"status":"indexed","url":"/about"}
{"id":2,"status":"crawled","url":"/products"}
{"id":3,"status":"error","url":"/old-page"}
NDJSON is common in log streaming, data export APIs, and monitoring feeds. Each line is parsed independently, so the client processes records as they arrive without buffering the entire response.
AI and LLM response streaming
Large language model APIs stream responses token-by-token using SSE. The server sends each generated token as an SSE event, allowing the client to display text as the model produces the output rather than waiting for the complete response.
HTTP/1.1 200 OK
Content-Type: text/event-stream
data: {"token":"HTTP"}
data: {"token":" streaming"}
data: {"token":" delivers"}
data: {"token":" data"}
data: {"token":" incrementally"}
data: [DONE]
This streaming pattern is standard across AI API providers and accounts for a significant share of modern SSE traffic.
Web Streams API
The Web Streams API (WHATWG Streams Standard)
provides JavaScript primitives for consuming and
transforming streamed data in the browser. The
fetch() API returns a ReadableStream on the
response body, enabling chunk-by-chunk processing
of HTTP responses.
const response = await fetch(
"https://api.example.re/export"
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value, {
stream: true
});
processChunk(text);
}
The API defines three stream types:
- ReadableStream: a source of data (the
response.bodyfromfetch()) - WritableStream: a destination for data
- TransformStream: a transform applied between a readable and writable stream (decompression, parsing, filtering)
Streams are composable through pipeThrough() and
pipeTo(). A response body piped through a
TextDecoderStream and then a custom NDJSON parser
creates a pipeline processing JSON objects as they
arrive over the network.
const response = await fetch(
"https://api.example.re/stream"
);
const lines = response.body
.pipeThrough(new TextDecoderStream())
.pipeThrough(splitByNewline());
const reader = lines.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const record = JSON.parse(value);
renderRow(record);
}
Backpressure
The Web Streams API supports backpressure. When the consumer processes data slower than the producer sends, the stream automatically signals the source to slow down. This prevents memory buildup from unbounded buffering.
Comparison with other patterns
| Pattern | Direction | Protocol | Connection |
|---|---|---|---|
| HTTP streaming | Server → client | HTTP | Single response |
| SSE | Server → client | HTTP | Long-lived |
| WebSocket | Bidirectional | WS/WSS | Persistent |
| Long polling | Server → client | HTTP | Repeated requests |
| HLS/DASH | Server → client | HTTP | Segment requests |
HTTP streaming sends a single response incrementally. The connection closes when the response completes.
SSE is a specific form of HTTP streaming with a standardized event format, automatic reconnection, and browser API support.
WebSocket provides full-duplex communication, where both client and server send messages at any time on the same connection. WebSocket is appropriate for interactive applications like chat, gaming, and collaborative editing.
Long polling simulates streaming by having the client send a Request, the server hold the connection until data is available, send a response, and the client immediately send a new request. This creates a series of short-lived connections rather than a true stream.
HLS and DASH are media delivery protocols where the client downloads a manifest file listing media segment URLs, then fetches each segment as a separate HTTP Request. The client controls playback and adapts quality based on network conditions. These are not HTTP streaming in the protocol sense.
Example
A complete streaming exchange. The server streams search results as NDJSON over chunked encoding.
Request
GET /api/search?q=http+streaming HTTP/1.1
Host: api.example.re
Accept: application/x-ndjson
Response (streamed)
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
Cache-Control: no-cache
{"id":1,"title":"HTTP Streaming Guide","score":0.98}
{"id":2,"title":"Server-Sent Events","score":0.91}
{"id":3,"title":"Chunked Encoding","score":0.87}
Each JSON line arrives as the search engine scores and ranks the result. The client renders each result immediately without waiting for the full result set.
An SSE stream delivering live metrics:
Request
GET /metrics/live HTTP/1.1
Host: api.example.re
Accept: text/event-stream
Response (streamed)
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
event: cpu
data: {"usage":42,"cores":8}
event: memory
data: {"used_gb":12.4,"total_gb":32}
event: cpu
data: {"usage":38,"cores":8}
The EventSource API in the browser reconnects
automatically if the connection drops, resuming from
the last received event ID.
Takeaway
HTTP streaming delivers response data incrementally, through chunked transfer encoding in HTTP/1.1, DATA frames in HTTP/2 and HTTP/3, or the SSE protocol for event streams. The Web Streams API provides browser-side primitives for consuming streamed data chunk by chunk. From streaming server-rendered HTML to real-time dashboards and AI model outputs, HTTP streaming reduces latency, lowers memory usage, and enables real-time data delivery without leaving the HTTP protocol.
See also
- RFC 9112: HTTP/1.1, Chunked Transfer Coding
- RFC 9113: HTTP/2
- RFC 9114: HTTP/3
- WHATWG HTML Standard: Server-Sent Events
- WHATWG Streams Standard
- Transfer-Encoding
- Content-Type
- WebSocket
- Protocol Upgrade
- Compression
- HTTP Connection
- HTTP Request
- HTTP Response
- HTTP headers