Zero-Copy Data Transfer Patterns

A WebAssembly function runs at near-native speed, but the data it operates on still lives in a JavaScript ArrayBuffer until you move it. The naive move is a copy: serialize the bytes into the module’s linear memory, compute, then copy the result back out. For a 4 MB RGBA frame that copy costs real time on every call, and it happens twice — in and out. This guide shows how to skip the copies entirely by constructing a typed-array view directly over the module’s memory at a known pointer, writing in place, and reading the result through the same view. Done right, marshaling a megabyte buffer becomes a pointer hand-off instead of a memcpy.

Prerequisites

  • [ ] A built .wasm module that exports memory and an allocator pair (alloc / free), or a wasm-bindgen build that exposes __wbindgen_malloc/__wbindgen_free.
  • [ ] Node.js 18+ or any browser with WebAssembly.instantiateStreaming (Chrome 61+, Firefox 58+, Safari 15+).
  • [ ] performance.now() available (browser or Node perf_hooks) for timing comparisons.
  • [ ] Familiarity with TypedArray constructors and DataViewnew Uint8Array(buffer, byteOffset, length).
  • [ ] For image work: a <canvas> (or OffscreenCanvas) and a 2D context.

Why copies cost, in numbers

linear memory is a single resizable ArrayBuffer. Every byte the module reads must already be at some offset inside that buffer, and every result it produces lands there too. A copy across the boundary is a TypedArray.prototype.set call (or a structured clone via postMessage), and it moves bytes at memory-bandwidth speed — roughly 8–12 GB/s for a single core on current laptops.

Do the arithmetic for a 1080p RGBA frame: 1920 × 1080 × 4 = 8,294,400 bytes ≈ 8 MB. Copying that in at 10 GB/s costs about 0.83 ms; copying the processed frame back out costs another 0.83 ms. That is ~1.66 ms of pure data movement per frame before a single pixel is touched. At a 60 fps budget of 16.6 ms per frame, you have just spent 10% of the frame on copies that produce no work. A 4 MB buffer (720p RGBA, 1280 × 720 × 4 ≈ 3.7 MB) still costs ~0.4 ms each way. Eliminate both copies and that time goes straight back into either headroom or higher resolution.

The view approach makes the cost zero. A typed-array view is a window onto bytes that already exist; constructing it is O(1) regardless of length, because it stores only a buffer reference, an offset, and a length — it never touches the bytes. The module writes through the same window. Nothing is duplicated.

How a view aliases linear memory

The core idea: a Uint8Array constructed with the three-argument form new Uint8Array(memory.buffer, ptr, len) does not allocate. It aliases the region [ptr, ptr + len) of the existing buffer. Reads and writes through it hit the same bytes the module’s load/store instructions hit. By contrast, view.slice() allocates a fresh buffer and copies — use it only when you deliberately want an independent snapshot.

Aliasing view versus copy A single linear memory ArrayBuffer with a highlighted region at a pointer. A Uint8Array view aliases that region with no copy; a slice() copies the bytes into a separate new buffer. One linear memory ArrayBuffer bytes [ptr, ptr+len) 0 Uint8Array view (buffer, ptr, len) — no copy aliases the same bytes view.slice() new buffer — full copy separate copied bytes

The view comes in several element widths, each interpreting the same bytes differently:

  • Uint8Array / Uint8ClampedArray — one byte per element; the clamped variant is what canvas ImageData.data uses, saturating writes to [0, 255].
  • Float32Array — four bytes per element, native byte order; ideal for audio samples and tensors.
  • Float64Array — eight bytes; double-precision numeric arrays.
  • DataView — no fixed element width; you choose width and endianness per read, the right tool for mixed-layout structs.

The choice of view width is a contract with the module’s data layout, not a free decision. If the module wrote 32-bit floats, you read them through a Float32Array (or DataView.getFloat32); reading the same bytes through a Uint8Array gives you the raw IEEE-754 bytes, not the numbers. The bytes never change — only your interpretation of them does. This is why a single region of linear memory can be aliased by several views at once: a Uint8Array for a raw memcpy, a Float32Array for the sample values, and a DataView for a header, all pointing at overlapping byte ranges of the same buffer. None of them owns the bytes; the module’s allocator does.

Media and numeric payloads

The three workloads where zero-copy pays off most all reduce to the same pattern — large contiguous buffers that the module reads or writes once per call:

  • Canvas pixels. ImageData.data is a Uint8ClampedArray of straight RGBA bytes. Place a frame in linear memory once, run an in-place filter, and read it back through the aliased view — detailed in avoiding copies when passing image buffers.
  • Audio channel data. AudioBuffer.getChannelData(n) returns a Float32Array of samples in [-1, 1]. A DSP module (reverb, EQ, resampling) processes that channel in place; aliasing the channel region as a Float32Array over linear memory avoids copying every audio block, which matters when the audio thread runs on a 128-sample quantum.
  • Tensors. Inference inputs and outputs are flat Float32Array (or quantized Int8Array) blocks. A model that runs in Wasm reads its input tensor and writes its output tensor directly in linear memory; the JavaScript side only ever sees pointers and shapes, never a copied tensor.

In every case the rule is identical: the bytes live in the module’s memory, and JavaScript holds a window onto them. The window is what makes the transfer free.

Step-by-step workflow

The canonical pattern is allocate once, write in place, call, read in place, free. Each numbered step maps to one line of glue.

  1. Instantiate and grab memory. Hold a reference to instance.exports.memory — but never cache memory.buffer long-term, because it can change.

    const { instance } = await WebAssembly.instantiateStreaming(
      fetch("/process.wasm"),
    );
    const { memory, alloc, free, process } = instance.exports;
  2. Allocate a region in the module’s heap. The allocator returns a pointer — a byte offset into linear memory.

    const len = data.length;          // bytes you intend to transfer
    const ptr = alloc(len);           // offset into linear memory
  3. Build a view over that exact region. This is the zero-copy step — no bytes move.

    const view = new Uint8Array(memory.buffer, ptr, len);
  4. Write your source data through the view. view.set(data) copies into linear memory once; for true zero-copy you instead produce data directly into view (e.g. getImageData already gave you bytes, and you alias them in step 6 for the readback).

    view.set(data);                   // one fill of the destination region
  5. Call the export with the pointer and length. Only integers cross the boundary.

    process(ptr, len);                // mutates the bytes in place
  6. Read the result through the same view. Because the module wrote in place, view already reflects the output — no copy out.

    // view now holds the processed bytes; consume directly
    ctx.putImageData(new ImageData(new Uint8ClampedArray(memory.buffer, ptr, len), w, h), 0, 0);
  7. Free the region. Whoever allocated owns the free. Wrap in try/finally so a thrown error still releases the heap.

    free(ptr, len);

When the same buffer is processed every frame — video, audio, a live tensor — you hoist steps 2–3 out of the loop and reuse the pointer. That is the single biggest win for streaming workloads: one allocation amortized over thousands of frames. The allocator mechanics behind alloc/free are covered in linear memory management and allocators.

The alloc/free exports themselves are not magic — they are just functions the module exposes that return and reclaim offsets into its own heap region of linear memory. A Rust build wires these up as __wbindgen_malloc and __wbindgen_free; a hand-written module might export a bump allocator that only ever moves a high-water pointer forward. Either way, the pointer you receive is an integer offset, and the bytes at [ptr, ptr + len) are yours to use until you free them. The “allocate once” discipline matters because every call into the allocator is a chance for it to call memory.grow, and a grow is the one event that can detach every view you hold.

There is a subtlety in step 4 worth stating plainly. view.set(data) is itself a copy — it moves bytes from your JavaScript source array into linear memory. True end-to-end zero-copy is only achievable when the data originates in linear memory (the module decoded it, or a previous call produced it). When the data starts life in a separate JavaScript buffer — a freshly fetched ArrayBuffer, a canvas getImageData result — you pay exactly one copy to get it into the module’s memory, and the win is eliminating the second copy on the way back out plus every redundant intermediate. For a transform that previously copied in, copied to a scratch buffer, and copied out, collapsing three copies to one is still a 3× reduction in data movement.

A complete binding example

This module exports an in-place transform. The JavaScript allocates once, aliases the region, runs the transform, and reads the output without a single copy out.

async function run() {
  const { instance } = await WebAssembly.instantiateStreaming(
    fetch("/invert.wasm"),
  );
  const { memory, alloc, free, invert } = instance.exports;

  // Source pixels from a canvas — straight (non-premultiplied) RGBA.
  const src = ctx.getImageData(0, 0, width, height);   // Uint8ClampedArray
  const len = src.data.length;

  const ptr = alloc(len);
  try {
    // Alias the destination region in linear memory — no allocation here.
    const buf = new Uint8ClampedArray(memory.buffer, ptr, len);
    buf.set(src.data);                                  // fill once

    invert(ptr, len);                                   // module writes in place

    // buf already reflects the inverted pixels — read in place, no copy out.
    ctx.putImageData(new ImageData(buf.slice(), width, height), 0, 0);
  } finally {
    free(ptr, len);                                     // release before buffer can grow
  }
}

The only slice() here is the final putImageData, because ImageData takes ownership of its backing array and the region is about to be freed — a deliberate snapshot, not an accidental copy. If you keep the region alive across frames you skip even that.

Tradeoffs: speed versus ownership complexity

Zero-copy is not free of cost — it trades memory bandwidth for lifetime discipline.

  • Win: no per-call memcpy; constant-time view construction; reusable resident buffers; lower GC pressure because you stop minting throwaway ArrayBuffers.
  • Cost: you now own the pointer’s lifetime. A view is only valid while (a) the region is allocated and (b) the backing buffer has not been detached by memory.grow. A view that outlives its data reads stale or freed bytes with no error.
  • Cost: aliasing means mutation is visible everywhere. If both JavaScript and the module hold views over the same region and run concurrently (Web Workers + SharedArrayBuffer), you need Atomics to avoid races — see SharedArrayBuffer, Atomics & threading.

The rule: prefer a view; reach for .slice() only when you specifically need an independent, detach-proof snapshot.

The ownership story deserves a concrete framing. With a copy, the lifetime question never arises — once the bytes are duplicated into an independent JavaScript array, that array is garbage-collected like any other object and you cannot read freed module memory through it. Zero-copy removes that safety net. A view is a raw pointer dressed up as a typed array: it stays “valid” syntactically even after the underlying region is freed or the buffer is detached, and the failure is silent — stale bytes, zeros, or another allocation’s data — rather than an exception. The discipline that buys back safety is small but non-negotiable: build the view as late as possible (right before you read or write), never store it across a call that might grow memory, and drop it the moment you free its region. Teams that treat a view as a long-lived field on an object are the ones that hit intermittent “all zeros” bugs that only reproduce under memory pressure.

A second, quieter tradeoff is debuggability. A copied buffer shows up in a heap snapshot as its own object with a clear size; a thousand transient views over linear memory are nearly invisible to the GC profiler because they hold almost no JavaScript-side memory. That is good for GC pressure but means your real memory usage lives inside the Wasm instance’s memory, where the JavaScript devtools heap view will not account for it. Size your WebAssembly.Memory deliberately and watch memory.buffer.byteLength rather than the JS heap when you reason about a zero-copy pipeline’s footprint.

Gotchas & failure modes

  • Detached buffer after memory.grow. Growing linear memory can allocate a new backing buffer and detach the old one. Any view over the old memory.buffer becomes zero-length — reads return undefined, .length is 0, writes silently no-op. Always re-create views from instance.exports.memory.buffer after any call that might allocate. This is detailed in why memory.grow invalidates pointers.

  • RangeError on misaligned Float32Array/Float64Array. A Float32Array view requires byteOffset to be a multiple of 4; Float64Array requires a multiple of 8. If your allocator hands back an odd pointer, new Float32Array(memory.buffer, ptr, n) throws RangeError: start offset of Float32Array should be a multiple of 4. Align allocations, or read via DataView which has no alignment constraint.

  • View outliving the data. After free(ptr, len), the bytes may be reused by the next allocation. A view you kept around now reads someone else’s data. Drop the view when you free the region.

  • .slice() where you meant a view. new Uint8Array(memory.buffer, ptr, len).slice() quietly reintroduces the copy you were trying to remove. Profile if your “zero-copy” path is still slow.

Alignment in practice

Alignment is the gotcha that turns a clean zero-copy path into an exception. A Float32Array view demands a byteOffset divisible by 4; a Float64Array demands divisibility by 8. The bytes do not care about alignment — the engine does, because it reads them with width-typed loads that assume aligned access. When your allocator returns a pointer that is already a multiple of 8 (most do, to satisfy the strictest primitive), you are fine. The trouble starts when you build a view at ptr + headerLen where headerLen is, say, 5 bytes — now the float payload starts on an odd offset and the constructor throws.

There are two clean fixes. First, lay out your data so numeric payloads start on aligned offsets: pad the header to a multiple of the largest element you will read, which costs a few bytes and keeps every view on the fast path. Second, when you cannot control the layout — decoding someone else’s format — fall back to DataView, which reads any width at any offset with no alignment constraint, at the cost of per-element calls instead of a single bulk view. The first option is faster; the second is universal.

Verification

Confirm the copies are actually gone by timing both paths with performance.now():

const N = 200;
const t0 = performance.now();
for (let i = 0; i < N; i++) {
  const copy = src.data.slice();        // copy-in path
  // ...process copy, copy back...
}
const copyMs = (performance.now() - t0) / N;

const t1 = performance.now();
for (let i = 0; i < N; i++) {
  const view = new Uint8ClampedArray(memory.buffer, ptr, len); // zero-copy
  invert(ptr, len);
}
const viewMs = (performance.now() - t1) / N;
console.log(`copy ${copyMs.toFixed(3)} ms vs view ${viewMs.toFixed(3)} ms`);

For an 8 MB frame you should see the view path drop the ~1.6 ms of per-frame copy time. You can also confirm aliasing directly: write a sentinel through the view, call the export, and assert the module saw it — if a copy snuck in, the sentinel will not propagate.

When the numbers do not match your expectation, two checks isolate the cause. First, log memory.buffer.byteLength before and after the hot path; if it changed, a grow detached your views and the “zero-copy” reads were actually hitting a stale buffer. Second, wrap the view construction in a guard that asserts view.length === len; a zero-length result is the unmistakable signature of a detached buffer, and catching it at the source beats chasing blank output downstream. With both guards in place, a zero-copy pipeline either runs correctly or fails loudly — never silently degrades back into copying.

In this guide

Frequently Asked Questions

Is constructing a typed-array view ever expensive? No. The three-argument constructor stores a buffer reference, a byte offset, and a length. It does not read or copy the underlying bytes, so it is O(1) whether the region is 16 bytes or 16 MB. The cost only appears if you call .slice(), .from(), or set(), which move bytes.

Why does my view suddenly read all zeros after a few calls? Almost always a detached buffer. A call grew linear memory, the engine swapped in a new backing ArrayBuffer, and your old view points at the detached one. Re-create the view from instance.exports.memory.buffer after any allocation, and never cache memory.buffer across calls that might grow.

Can I pass a JavaScript array directly without copying it in? Not a plain Array, and not a typed array backed by a different buffer — those bytes do not live in linear memory, so the module cannot reach them. The data must be inside the module’s memory. Zero-copy means producing or placing the data in linear memory once, then aliasing it, rather than holding it in a separate JavaScript buffer.

What alignment do I need for a Float32Array view? The byteOffset must be a multiple of 4 for Float32Array and a multiple of 8 for Float64Array, otherwise the constructor throws RangeError. Align your allocator’s returns, or read mixed/unaligned data through a DataView, which imposes no alignment requirement.

← Back to JS/Wasm Interop & Memory Management