Passing Complex Types Across the Boundary

A WebAssembly function signature can only carry numbers. The moment you want to hand a module a string, a byte slice, or a struct — or get one back — you leave the type system behind and enter the realm of convention: an agreed encoding of richer data into the integers and bytes the boundary actually supports. This guide formalizes those conventions, from the universal (ptr, len) pattern through struct layout in linear memory to the canonical ABI that the Component Model is now standardizing.

The constraint that shapes everything

The core type signatures of a Wasm function are limited to i32, i64, f32, f64, and (with the reference-types proposal) opaque handles. There is no string, no array, no struct at the call boundary. Every complex type is therefore decomposed into two operations the boundary does support: passing an integer, and reading or writing bytes in the shared linear memory buffer. Master that decomposition once and every “how do I pass an X” question answers itself — you serialize X into memory and pass the integers that locate it. This is the same linear memory channel described in the parent JS/Wasm Interop & Memory Management area; here we pin down the exact byte-level contract.

Prerequisites

  • [ ] Rust 1.78+ with the wasm32-unknown-unknown target (rustup target add wasm32-unknown-unknown)
  • [ ] wasm-pack 0.12+ and wasm-bindgen-cli 0.2.92+ for the generated-glue comparison
  • [ ] The WebAssembly Binary Toolkit (wabt) for wat2wasm, wasm-objdump, and wasm-validate
  • [ ] A browser or Node 20+ with WebAssembly.instantiateStreaming and TextEncoder/TextDecoder
  • [ ] Comfort reading linear memory byte offsets — see the typed-array view section linked below

The (ptr, len) ABI in one diagram

Almost every non-primitive value crosses the boundary as a pair of i32 values: a pointer (a byte offset into the module’s linear memory) and a length (how many bytes or elements follow). The caller writes the bytes into memory, then passes the pointer and length as ordinary integer arguments. The string "Wasm" sitting at offset 0x10 looks like this in the buffer:

String and struct layout in linear memory Top row shows the four-byte UTF-8 string "Wasm" at offsets 0x10 through 0x13, passed as pointer 0x10 and length 4. Bottom row shows a repr(C) struct with a u8 flag at offset 0, three padding bytes, and a u32 value at offset 4. String "Wasm" → ptr=0x10, len=4 0x57 'W' @0x10 0x61 'a' @0x11 0x73 's' @0x12 0x6D 'm' @0x13 #[repr(C)] { flag: u8, value: u32 } → 8 bytes flag u8 @0 padding @1..3 value u32 (little-endian) @4..7 value sits at offset 4, not 1: a u32 must be 4-byte aligned, so the compiler inserts 3 padding bytes.

For slices and typed arrays the convention is identical — (ptr, len) where len counts elements or bytes by prior agreement — which is why the same view machinery used for reading linear memory with typed arrays applies unchanged.

Strings: UTF-8 in, UTF-8 out

Rust and most WASI-targeting languages store strings as UTF-8. JavaScript strings are UTF-16 internally, so crossing the boundary always involves a transcode. TextEncoder.encode() turns a JS string into a Uint8Array of UTF-8 bytes; TextDecoder.decode() reverses it. The byte length after encoding is what you pass as len — and it is frequently not equal to the JS .length, because non-ASCII code points expand to two, three, or four bytes. The dedicated guide on encoding strings across the wasm boundary walks the full round trip; the short version is: encode, allocate, copy, pass (ptr, len), and free.

Step-by-step workflow

The manual ABI follows the same five steps regardless of the payload type. Here is the canonical JS → Wasm direction for a byte buffer.

  1. Export an allocator from the module. The host cannot safely pick offsets on its own, because the module’s allocator owns the heap. Expose alloc(size) -> ptr and dealloc(ptr, size).

    wat2wasm strings.wat -o strings.wasm
  2. Allocate space in linear memory. Call alloc with the byte count you need; it returns a pointer into the heap region the module controls.

    const ptr = instance.exports.alloc(bytes.length);
  3. Copy the bytes into memory at that offset. Build a Uint8Array view over the current memory.buffer and set() the payload at ptr.

    new Uint8Array(instance.exports.memory.buffer, ptr, bytes.length).set(bytes);
  4. Call the function with (ptr, len). The module reads exactly len bytes starting at ptr.

    const result = instance.exports.process(ptr, bytes.length);
  5. Free the allocation. Whoever allocated must free. Call dealloc(ptr, len) once the module is done reading — typically in a finally block so an exception cannot leak the buffer.

    instance.exports.dealloc(ptr, bytes.length);

A concrete WAT + JS example

The module below exports a one-page linear memory, a trivial bump alloc, and a byte_len-style function count_nonzero(ptr, len) that walks len bytes from ptr. The allocator hands out successive offsets from a mutable global; it is deliberately minimal so the ABI is visible with nothing hidden — contrast it with what a bump allocator does in the linear memory management & allocators guide.

(module
  (memory (export "memory") 1)                 ;; one 64 KiB page, exported to JS
  (global $bump (mut i32) (i32.const 1024))    ;; heap starts above a reserved low region

  ;; bump allocator: return current top, advance it; no free list
  (func (export "alloc") (param $size i32) (result i32)
    (local $p i32)
    (local.set $p (global.get $bump))
    (global.set $bump (i32.add (global.get $bump) (local.get $size)))
    (local.get $p))

  ;; no-op free for the bump strategy; present so the ABI is symmetric
  (func (export "dealloc") (param $ptr i32) (param $size i32))

  ;; count the non-zero bytes in [ptr, ptr+len) — reads exactly len bytes
  (func (export "count_nonzero") (param $ptr i32) (param $len i32) (result i32)
    (local $i i32) (local $acc i32)
    (block $done
      (loop $loop
        (br_if $done (i32.ge_u (local.get $i) (local.get $len)))
        (if (i32.load8_u (i32.add (local.get $ptr) (local.get $i)))
          (then (local.set $acc (i32.add (local.get $acc) (i32.const 1)))))
        (local.set $i (i32.add (local.get $i) (i32.const 1)))
        (br $loop)))
    (local.get $acc)))
const { instance } = await WebAssembly.instantiateStreaming(fetch("/strings.wasm"));
const { alloc, dealloc, count_nonzero, memory } = instance.exports;

const bytes = new TextEncoder().encode("Wasm\0ABI");   // 8 bytes, one embedded NUL
const ptr = alloc(bytes.length);
try {
  new Uint8Array(memory.buffer, ptr, bytes.length).set(bytes);
  const nonzero = count_nonzero(ptr, bytes.length);
  console.log(nonzero);                                 // 7 (the NUL is the only zero byte)
} finally {
  dealloc(ptr, bytes.length);
}

Returning multiple values

A function with one return slot cannot hand back a (ptr, len) pair directly. There are three standard escapes, in rough order of portability:

  • Out-pointer. The caller allocates a small scratch region and passes its pointer; the function writes the result fields there and returns nothing (or a status code). This is how a struct comes back, covered in returning structs from wasm to javascript.
  • Packed i64. Two i32 values fit in one 64-bit return: (ptr << 32) | len. The host unpacks with BigInt shifts. Compact, but limited to two 32-bit fields and awkward in JavaScript because of BigInt.
  • Multi-value return. The multi-value proposal — shipped in every current engine — lets a function declare (result i32 i32) and return both directly, no packing. wasm-bindgen and wat2wasm emit this freely.
;; multi-value: return both the pointer and the length of a result buffer
(func (export "make_result") (result i32 i32)
  (i32.const 2048)        ;; ptr
  (i32.const 16))         ;; len — both land on the value stack and return together
const [ptr, len] = instance.exports.make_result();   // a JS array of two numbers
const view = new Uint8Array(instance.exports.memory.buffer, ptr, len);

Struct layout in linear memory

A struct is just a fixed sequence of fields at known offsets. To read or write one from JavaScript you need the exact layout the module’s compiler chose, and that means controlling it. In Rust, default repr(Rust) layout is unspecified — the compiler may reorder fields — so you annotate the type with #[repr(C)] to get the predictable C ABI: fields in declaration order, each aligned to its own size, with padding inserted to satisfy alignment.

#[repr(C)]
pub struct Particle {
    pub id: u32,      // offset 0, 4 bytes
    pub x: f32,       // offset 4, 4 bytes
    pub y: f32,       // offset 8, 4 bytes
    pub alive: u8,    // offset 12, 1 byte
    // 3 bytes tail padding → size 16, alignment 4
}

On the JavaScript side you read those fields with a DataView, passing the offset of each field and matching the endianness — Wasm linear memory is always little-endian, so pass true to every DataView getter. The struct above decodes as:

const dv = new DataView(memory.buffer, structPtr, 16);
const particle = {
  id:    dv.getUint32(0, true),
  x:     dv.getFloat32(4, true),
  y:     dv.getFloat32(8, true),
  alive: dv.getUint8(12) !== 0,
};

Get the offsets wrong — most often by forgetting alignment padding — and every field after the mistake is garbage. The returning structs guide shows how to derive offsets mechanically and verify them against the compiler.

Manual ABI vs wasm-bindgen-generated

Everything above is the manual ABI: you own the encode, the allocation, the copy, and the free. The Rust ecosystem mostly hides it behind #[wasm_bindgen], which generates a JavaScript shim that does the identical dance — encode the string, call alloc, copy, pass (ptr, len), decode the return, and free — automatically. The tradeoff is control versus boilerplate.

Concern Manual ABI wasm-bindgen-generated
Lines you write Many (alloc, copy, free per call) One annotation
Copy semantics Fully under your control; zero-copy possible Copies in and out by default
Type safety None — offsets are by convention Generated .d.ts types
Debuggability Every byte is visible Read the emitted glue to see it
Best for Hot paths, custom layouts, non-Rust modules Application code, fast iteration

The full account of the generated glue — JsValue, serde-wasm-bindgen, closures, and how it extends to whole objects — lives in the wasm-bindgen deep dive. Reach for manual ABI when a profiler shows marshaling on the hot path or when you need a layout wasm-bindgen will not emit; reach for the generated glue everywhere else.

Toward a standard: the canonical ABI

Hand-rolled (ptr, len) conventions do not compose: two modules built by different toolchains disagree on how a string is laid out, so they cannot call each other directly. The Component Model fixes this by defining a canonical ABI — a single, language-neutral lowering of high-level types (strings, lists, records, variants) into core Wasm. You describe the interface in WIT (the WebAssembly Interface Type language), and the toolchain generates the lifting and lowering glue so a Rust component and a JavaScript host agree on layout by specification rather than by convention.

// a WIT interface — the canonical ABI lowers these types for you
interface geometry {
  record point { x: f32, y: f32 }
  distance: func(a: point, b: point) -> f32
}

The canonical ABI still bottoms out in pointers, lengths, and linear memory — it is the same machinery, standardized and code-generated — but it removes the per-project bespoke contract. Until components are ubiquitous, the manual ABI and wasm-bindgen remain how most production code crosses the boundary, which is why both companion guides below stay at the byte level.

Optimization flags & tradeoffs

The decisive axis is copy vs zero-copy. A manual ABI lets you allocate once and let the module write results in place, skipping the inbound and outbound copies that wasm-bindgen performs by default. For a 4 MB buffer at ~10 GB/s, each eliminated copy saves ~0.4 ms — often more than the computation itself. The catalogue of these patterns lives in zero-copy data transfer patterns.

The second axis is batching. Per-call marshaling — the allocate/copy/free overhead — dominates at small payload sizes, so one call processing 10,000 elements beats 10,000 calls processing one. Enable the multi-value proposal (--enable-multi-value in older wat2wasm; on by default now) to return pairs without BigInt packing, and prefer i32 lengths over i64 unless you genuinely address more than 4 GiB of memory.

Gotchas & failure modes

  • Forgetting to free → leak. A bump allocator never reclaims; even a real malloc/free leaks if you skip the dealloc. Wrap every alloc in try/finally. A growing memory.buffer.byteLength across calls is the symptom.
  • Reading len bytes past the buffer → trap. If you pass a len larger than the bytes you actually wrote (or a stale ptr), the module’s i32.load walks off the allocation. In bounds it reads garbage; past memory.byteLength it traps with RuntimeError: memory access out of bounds. Validate every length you receive from JavaScript before using it.
  • Alignment / padding mismatch. Decoding a #[repr(C)] struct with the wrong offsets — usually by ignoring tail or interior padding — silently corrupts every field after the error. Derive offsets from the type, never by eyeballing field sizes.
  • Stale view after memory.grow. Any allocation that triggers a grow can detach memory.buffer, zero-lengthing your Uint8Array/DataView. Re-create views after every alloc — the mechanism is detailed in why memory.grow invalidates pointers.
  • UTF-8 vs UTF-16 length. Passing str.length (UTF-16 units) instead of the encoded byte length truncates or overruns multibyte strings. Always pass encoded.length.

Verification

Confirm the static side with wasm-objdump. The -x section dump shows your exported allocator and functions; -s dumps the data section so you can see embedded string constants at their offsets.

# list exports — confirm alloc, dealloc, count_nonzero, memory are present
wasm-objdump -x strings.wasm | grep -A6 "Export"

# dump the data section as hex — confirm any static strings land where you expect
wasm-objdump -s -j data strings.wasm

At runtime, validate before instantiating and inspect memory in DevTools. wasm-validate strings.wasm catches malformed multi-value signatures; the Memory inspector in Chrome DevTools lets you read the exact bytes at a ptr to confirm your encode wrote what you expected.

In this guide

Frequently Asked Questions

Why must I export an allocator from the module instead of just picking an offset in JavaScript? The module’s compiled code owns its heap — its own alloc/free track which regions are in use. If you write to an offset you chose yourself, you may stomp on the module’s stack, its static data, or a live allocation. Calling the module’s exported alloc gets you a pointer the module guarantees is free.

Is the (ptr, len) length in bytes or in elements? Whatever you and the module agree on — the boundary has no built-in notion. For strings and byte buffers it is bytes; for a &[u32] slice it is usually the element count, with the module multiplying by 4 internally. Document the unit per function, because there is no type to enforce it.

Do I need #[repr(C)] if I only ever read the struct from another Rust function? No — within a single Rust program the compiler is consistent. You need #[repr(C)] precisely when an external reader (JavaScript via DataView, or another language) must know the layout, because default repr(Rust) layout is unspecified and may change between compiler versions.

When should I prefer the multi-value return over a packed i64? Almost always — multi-value is supported in every current engine and avoids BigInt on the JavaScript side. Reach for the packed i64 only when targeting an old runtime without multi-value, or when an existing ABI you must match already packs.

Does the Component Model’s canonical ABI replace wasm-bindgen? Eventually it overlaps heavily, but today wasm-bindgen targets the browser import object model directly while components need a host that understands the Component Model. For shipping browser code in 2026, wasm-bindgen or manual ABI is still the path; WIT and the canonical ABI are where the ecosystem is heading.

← Back to JS/Wasm Interop & Memory Management