Zero-Copy Data Transfer Patterns
A WebAssembly function runs at near-native speed, but the data it operates on still lives in a
JavaScript ArrayBuffer until you move it. The naive move is a copy: serialize the bytes into the
module’s linear memory, compute, then copy the result back out. For a 4 MB RGBA frame that copy
costs real time on every call, and it happens twice — in and out. This guide shows how to skip the
copies entirely by constructing a typed-array view directly over the module’s memory at a known
pointer, writing in place, and reading the result through the same view. Done right, marshaling a
megabyte buffer becomes a pointer hand-off instead of a memcpy.
Prerequisites
- [ ] A built
.wasmmodule that exportsmemoryand an allocator pair (alloc/free), or awasm-bindgenbuild that exposes__wbindgen_malloc/__wbindgen_free. - [ ] Node.js 18+ or any browser with
WebAssembly.instantiateStreaming(Chrome 61+, Firefox 58+, Safari 15+). - [ ]
performance.now()available (browser or Nodeperf_hooks) for timing comparisons. - [ ] Familiarity with
TypedArrayconstructors andDataView—new Uint8Array(buffer, byteOffset, length). - [ ] For image work: a
<canvas>(orOffscreenCanvas) and a 2D context.
Why copies cost, in numbers
linear memory is a single resizable ArrayBuffer. Every byte the module reads must already be at
some offset inside that buffer, and every result it produces lands there too. A copy across the
boundary is a TypedArray.prototype.set call (or a structured clone via postMessage), and it moves
bytes at memory-bandwidth speed — roughly 8–12 GB/s for a single core on current laptops.
Do the arithmetic for a 1080p RGBA frame: 1920 × 1080 × 4 = 8,294,400 bytes ≈ 8 MB. Copying that in
at 10 GB/s costs about 0.83 ms; copying the processed frame back out costs another 0.83 ms. That is
~1.66 ms of pure data movement per frame before a single pixel is touched. At a 60 fps budget of
16.6 ms per frame, you have just spent 10% of the frame on copies that produce no work. A 4 MB buffer
(720p RGBA, 1280 × 720 × 4 ≈ 3.7 MB) still costs ~0.4 ms each way. Eliminate both copies and that
time goes straight back into either headroom or higher resolution.
The view approach makes the cost zero. A typed-array view is a window onto bytes that already exist;
constructing it is O(1) regardless of length, because it stores only a buffer reference, an offset,
and a length — it never touches the bytes. The module writes through the same window. Nothing is
duplicated.
How a view aliases linear memory
The core idea: a Uint8Array constructed with the three-argument form
new Uint8Array(memory.buffer, ptr, len) does not allocate. It aliases the region
[ptr, ptr + len) of the existing buffer. Reads and writes through it hit the same bytes the module’s
load/store instructions hit. By contrast, view.slice() allocates a fresh buffer and copies — use
it only when you deliberately want an independent snapshot.
The view comes in several element widths, each interpreting the same bytes differently:
Uint8Array/Uint8ClampedArray— one byte per element; the clamped variant is what canvasImageData.datauses, saturating writes to[0, 255].Float32Array— four bytes per element, native byte order; ideal for audio samples and tensors.Float64Array— eight bytes; double-precision numeric arrays.DataView— no fixed element width; you choose width and endianness per read, the right tool for mixed-layout structs.
The choice of view width is a contract with the module’s data layout, not a free decision. If the
module wrote 32-bit floats, you read them through a Float32Array (or DataView.getFloat32); reading
the same bytes through a Uint8Array gives you the raw IEEE-754 bytes, not the numbers. The bytes
never change — only your interpretation of them does. This is why a single region of linear memory
can be aliased by several views at once: a Uint8Array for a raw memcpy, a Float32Array for the
sample values, and a DataView for a header, all pointing at overlapping byte ranges of the same
buffer. None of them owns the bytes; the module’s allocator does.
Media and numeric payloads
The three workloads where zero-copy pays off most all reduce to the same pattern — large contiguous buffers that the module reads or writes once per call:
- Canvas pixels.
ImageData.datais aUint8ClampedArrayof straight RGBA bytes. Place a frame inlinear memoryonce, run an in-place filter, and read it back through the aliased view — detailed in avoiding copies when passing image buffers. - Audio channel data.
AudioBuffer.getChannelData(n)returns aFloat32Arrayof samples in[-1, 1]. A DSP module (reverb, EQ, resampling) processes that channel in place; aliasing the channel region as aFloat32Arrayoverlinear memoryavoids copying every audio block, which matters when the audio thread runs on a 128-sample quantum. - Tensors. Inference inputs and outputs are flat
Float32Array(or quantizedInt8Array) blocks. A model that runs in Wasm reads its input tensor and writes its output tensor directly inlinear memory; the JavaScript side only ever sees pointers and shapes, never a copied tensor.
In every case the rule is identical: the bytes live in the module’s memory, and JavaScript holds a window onto them. The window is what makes the transfer free.
Step-by-step workflow
The canonical pattern is allocate once, write in place, call, read in place, free. Each numbered step maps to one line of glue.
-
Instantiate and grab
memory. Hold a reference toinstance.exports.memory— but never cachememory.bufferlong-term, because it can change.const { instance } = await WebAssembly.instantiateStreaming( fetch("/process.wasm"), ); const { memory, alloc, free, process } = instance.exports; -
Allocate a region in the module’s heap. The allocator returns a pointer — a byte offset into
linear memory.const len = data.length; // bytes you intend to transfer const ptr = alloc(len); // offset into linear memory -
Build a view over that exact region. This is the zero-copy step — no bytes move.
const view = new Uint8Array(memory.buffer, ptr, len); -
Write your source data through the view.
view.set(data)copies intolinear memoryonce; for true zero-copy you instead produce data directly intoview(e.g.getImageDataalready gave you bytes, and you alias them in step 6 for the readback).view.set(data); // one fill of the destination region -
Call the export with the pointer and length. Only integers cross the boundary.
process(ptr, len); // mutates the bytes in place -
Read the result through the same view. Because the module wrote in place,
viewalready reflects the output — no copy out.// view now holds the processed bytes; consume directly ctx.putImageData(new ImageData(new Uint8ClampedArray(memory.buffer, ptr, len), w, h), 0, 0); -
Free the region. Whoever allocated owns the free. Wrap in
try/finallyso a thrown error still releases the heap.free(ptr, len);
When the same buffer is processed every frame — video, audio, a live tensor — you hoist steps 2–3 out
of the loop and reuse the pointer. That is the single biggest win for streaming workloads: one
allocation amortized over thousands of frames. The allocator mechanics behind alloc/free are
covered in linear memory management and allocators.
The alloc/free exports themselves are not magic — they are just functions the module exposes that
return and reclaim offsets into its own heap region of linear memory. A Rust build wires these up as
__wbindgen_malloc and __wbindgen_free; a hand-written module might export a bump allocator that
only ever moves a high-water pointer forward. Either way, the pointer you receive is an integer offset,
and the bytes at [ptr, ptr + len) are yours to use until you free them. The “allocate once” discipline
matters because every call into the allocator is a chance for it to call memory.grow, and a grow is
the one event that can detach every view you hold.
There is a subtlety in step 4 worth stating plainly. view.set(data) is itself a copy — it moves bytes
from your JavaScript source array into linear memory. True end-to-end zero-copy is only achievable when
the data originates in linear memory (the module decoded it, or a previous call produced it). When
the data starts life in a separate JavaScript buffer — a freshly fetched ArrayBuffer, a canvas
getImageData result — you pay exactly one copy to get it into the module’s memory, and the win is
eliminating the second copy on the way back out plus every redundant intermediate. For a transform
that previously copied in, copied to a scratch buffer, and copied out, collapsing three copies to one is
still a 3× reduction in data movement.
A complete binding example
This module exports an in-place transform. The JavaScript allocates once, aliases the region, runs the transform, and reads the output without a single copy out.
async function run() {
const { instance } = await WebAssembly.instantiateStreaming(
fetch("/invert.wasm"),
);
const { memory, alloc, free, invert } = instance.exports;
// Source pixels from a canvas — straight (non-premultiplied) RGBA.
const src = ctx.getImageData(0, 0, width, height); // Uint8ClampedArray
const len = src.data.length;
const ptr = alloc(len);
try {
// Alias the destination region in linear memory — no allocation here.
const buf = new Uint8ClampedArray(memory.buffer, ptr, len);
buf.set(src.data); // fill once
invert(ptr, len); // module writes in place
// buf already reflects the inverted pixels — read in place, no copy out.
ctx.putImageData(new ImageData(buf.slice(), width, height), 0, 0);
} finally {
free(ptr, len); // release before buffer can grow
}
}
The only slice() here is the final putImageData, because ImageData takes ownership of its
backing array and the region is about to be freed — a deliberate snapshot, not an accidental copy. If
you keep the region alive across frames you skip even that.
Tradeoffs: speed versus ownership complexity
Zero-copy is not free of cost — it trades memory bandwidth for lifetime discipline.
- Win: no per-call
memcpy; constant-time view construction; reusable resident buffers; lower GC pressure because you stop minting throwawayArrayBuffers. - Cost: you now own the pointer’s lifetime. A view is only valid while (a) the region is allocated
and (b) the backing buffer has not been detached by
memory.grow. A view that outlives its data reads stale or freed bytes with no error. - Cost: aliasing means mutation is visible everywhere. If both JavaScript and the module hold views
over the same region and run concurrently (Web Workers +
SharedArrayBuffer), you needAtomicsto avoid races — see SharedArrayBuffer, Atomics & threading.
The rule: prefer a view; reach for .slice() only when you specifically need an independent,
detach-proof snapshot.
The ownership story deserves a concrete framing. With a copy, the lifetime question never arises — once the bytes are duplicated into an independent JavaScript array, that array is garbage-collected like any other object and you cannot read freed module memory through it. Zero-copy removes that safety net. A view is a raw pointer dressed up as a typed array: it stays “valid” syntactically even after the underlying region is freed or the buffer is detached, and the failure is silent — stale bytes, zeros, or another allocation’s data — rather than an exception. The discipline that buys back safety is small but non-negotiable: build the view as late as possible (right before you read or write), never store it across a call that might grow memory, and drop it the moment you free its region. Teams that treat a view as a long-lived field on an object are the ones that hit intermittent “all zeros” bugs that only reproduce under memory pressure.
A second, quieter tradeoff is debuggability. A copied buffer shows up in a heap snapshot as its own
object with a clear size; a thousand transient views over linear memory are nearly invisible to the GC
profiler because they hold almost no JavaScript-side memory. That is good for GC pressure but means your
real memory usage lives inside the Wasm instance’s memory, where the JavaScript devtools heap view
will not account for it. Size your WebAssembly.Memory deliberately and watch memory.buffer.byteLength
rather than the JS heap when you reason about a zero-copy pipeline’s footprint.
Gotchas & failure modes
-
Detached buffer after
memory.grow. Growinglinear memorycan allocate a new backing buffer and detach the old one. Any view over the oldmemory.bufferbecomes zero-length — reads returnundefined,.lengthis 0, writes silently no-op. Always re-create views frominstance.exports.memory.bufferafter any call that might allocate. This is detailed in why memory.grow invalidates pointers. -
RangeErroron misalignedFloat32Array/Float64Array. AFloat32Arrayview requiresbyteOffsetto be a multiple of 4;Float64Arrayrequires a multiple of 8. If your allocator hands back an odd pointer,new Float32Array(memory.buffer, ptr, n)throwsRangeError: start offset of Float32Array should be a multiple of 4. Align allocations, or read viaDataViewwhich has no alignment constraint. -
View outliving the data. After
free(ptr, len), the bytes may be reused by the next allocation. A view you kept around now reads someone else’s data. Drop the view when you free the region. -
.slice()where you meant a view.new Uint8Array(memory.buffer, ptr, len).slice()quietly reintroduces the copy you were trying to remove. Profile if your “zero-copy” path is still slow.
Alignment in practice
Alignment is the gotcha that turns a clean zero-copy path into an exception. A Float32Array view
demands a byteOffset divisible by 4; a Float64Array demands divisibility by 8. The bytes do not care
about alignment — the engine does, because it reads them with width-typed loads that assume aligned
access. When your allocator returns a pointer that is already a multiple of 8 (most do, to satisfy the
strictest primitive), you are fine. The trouble starts when you build a view at ptr + headerLen where
headerLen is, say, 5 bytes — now the float payload starts on an odd offset and the constructor throws.
There are two clean fixes. First, lay out your data so numeric payloads start on aligned offsets: pad the
header to a multiple of the largest element you will read, which costs a few bytes and keeps every view on
the fast path. Second, when you cannot control the layout — decoding someone else’s format — fall back to
DataView, which reads any width at any offset with no alignment constraint, at the cost of per-element
calls instead of a single bulk view. The first option is faster; the second is universal.
Verification
Confirm the copies are actually gone by timing both paths with performance.now():
const N = 200;
const t0 = performance.now();
for (let i = 0; i < N; i++) {
const copy = src.data.slice(); // copy-in path
// ...process copy, copy back...
}
const copyMs = (performance.now() - t0) / N;
const t1 = performance.now();
for (let i = 0; i < N; i++) {
const view = new Uint8ClampedArray(memory.buffer, ptr, len); // zero-copy
invert(ptr, len);
}
const viewMs = (performance.now() - t1) / N;
console.log(`copy ${copyMs.toFixed(3)} ms vs view ${viewMs.toFixed(3)} ms`);
For an 8 MB frame you should see the view path drop the ~1.6 ms of per-frame copy time. You can also confirm aliasing directly: write a sentinel through the view, call the export, and assert the module saw it — if a copy snuck in, the sentinel will not propagate.
When the numbers do not match your expectation, two checks isolate the cause. First, log
memory.buffer.byteLength before and after the hot path; if it changed, a grow detached your views and
the “zero-copy” reads were actually hitting a stale buffer. Second, wrap the view construction in a guard
that asserts view.length === len; a zero-length result is the unmistakable signature of a detached
buffer, and catching it at the source beats chasing blank output downstream. With both guards in place, a
zero-copy pipeline either runs correctly or fails loudly — never silently degrades back into copying.
In this guide
- Reading Wasm linear memory with typed arrays —
build views at offsets, decode structs field-by-field, and use
DataViewfor endian-specific reads. - Avoiding copies when passing image buffers —
process canvas
ImageDatain place, reuse a frame-sized buffer, and run it in anOffscreenCanvasworker.
Frequently Asked Questions
Is constructing a typed-array view ever expensive?
No. The three-argument constructor stores a buffer reference, a byte offset, and a length. It does not
read or copy the underlying bytes, so it is O(1) whether the region is 16 bytes or 16 MB. The cost
only appears if you call .slice(), .from(), or set(), which move bytes.
Why does my view suddenly read all zeros after a few calls?
Almost always a detached buffer. A call grew linear memory, the engine swapped in a new backing
ArrayBuffer, and your old view points at the detached one. Re-create the view from
instance.exports.memory.buffer after any allocation, and never cache memory.buffer across calls
that might grow.
Can I pass a JavaScript array directly without copying it in?
Not a plain Array, and not a typed array backed by a different buffer — those bytes do not live in
linear memory, so the module cannot reach them. The data must be inside the module’s memory. Zero-copy
means producing or placing the data in linear memory once, then aliasing it, rather than holding it in
a separate JavaScript buffer.
What alignment do I need for a Float32Array view?
The byteOffset must be a multiple of 4 for Float32Array and a multiple of 8 for Float64Array,
otherwise the constructor throws RangeError. Align your allocator’s returns, or read mixed/unaligned
data through a DataView, which imposes no alignment requirement.
Related
- Reading linear memory with typed arrays — construct views at offsets and decode structs.
- Avoiding copies when passing image buffers — in-place canvas pixel processing.
- Linear memory management & allocators — the
alloc/freepair behind every pointer. - Passing complex types across the boundary — ABI conventions for strings and structs.
← Back to JS/Wasm Interop & Memory Management