Passing Complex Types Across the Boundary
A WebAssembly function signature can only carry numbers. The moment you want to hand a module a string,
a byte slice, or a struct — or get one back — you leave the type system behind and enter the realm of
convention: an agreed encoding of richer data into the integers and bytes the boundary actually
supports. This guide formalizes those conventions, from the universal (ptr, len) pattern through
struct layout in linear memory to the canonical ABI that the Component Model is now standardizing.
The constraint that shapes everything
The core type signatures of a Wasm function are limited to i32, i64, f32, f64, and (with the
reference-types proposal) opaque handles. There is no string, no array, no struct at the call
boundary. Every complex type is therefore decomposed into two operations the boundary does support:
passing an integer, and reading or writing bytes in the shared linear memory buffer. Master that
decomposition once and every “how do I pass an X” question answers itself — you serialize X into
memory and pass the integers that locate it. This is the same linear memory channel described in the
parent JS/Wasm Interop & Memory Management area; here we pin
down the exact byte-level contract.
Prerequisites
- [ ] Rust 1.78+ with the
wasm32-unknown-unknowntarget (rustup target add wasm32-unknown-unknown) - [ ]
wasm-pack0.12+ andwasm-bindgen-cli0.2.92+ for the generated-glue comparison - [ ] The WebAssembly Binary Toolkit (
wabt) forwat2wasm,wasm-objdump, andwasm-validate - [ ] A browser or Node 20+ with
WebAssembly.instantiateStreamingandTextEncoder/TextDecoder - [ ] Comfort reading
linear memorybyte offsets — see the typed-array view section linked below
The (ptr, len) ABI in one diagram
Almost every non-primitive value crosses the boundary as a pair of i32 values: a pointer (a byte
offset into the module’s linear memory) and a length (how many bytes or elements follow). The
caller writes the bytes into memory, then passes the pointer and length as ordinary integer arguments.
The string "Wasm" sitting at offset 0x10 looks like this in the buffer:
For slices and typed arrays the convention is identical — (ptr, len) where len counts elements
or bytes by prior agreement — which is why the same view machinery used for
reading linear memory with typed arrays
applies unchanged.
Strings: UTF-8 in, UTF-8 out
Rust and most WASI-targeting languages store strings as UTF-8. JavaScript strings are UTF-16 internally,
so crossing the boundary always involves a transcode. TextEncoder.encode() turns a JS string into a
Uint8Array of UTF-8 bytes; TextDecoder.decode() reverses it. The byte length after encoding is what
you pass as len — and it is frequently not equal to the JS .length, because non-ASCII code points
expand to two, three, or four bytes. The dedicated guide on
encoding strings across the wasm boundary
walks the full round trip; the short version is: encode, allocate, copy, pass (ptr, len), and free.
Step-by-step workflow
The manual ABI follows the same five steps regardless of the payload type. Here is the canonical JS → Wasm direction for a byte buffer.
-
Export an allocator from the module. The host cannot safely pick offsets on its own, because the module’s allocator owns the heap. Expose
alloc(size) -> ptranddealloc(ptr, size).wat2wasm strings.wat -o strings.wasm -
Allocate space in
linear memory. Callallocwith the byte count you need; it returns a pointer into the heap region the module controls.const ptr = instance.exports.alloc(bytes.length); -
Copy the bytes into memory at that offset. Build a
Uint8Arrayview over the currentmemory.bufferandset()the payload atptr.new Uint8Array(instance.exports.memory.buffer, ptr, bytes.length).set(bytes); -
Call the function with
(ptr, len). The module reads exactlylenbytes starting atptr.const result = instance.exports.process(ptr, bytes.length); -
Free the allocation. Whoever allocated must free. Call
dealloc(ptr, len)once the module is done reading — typically in afinallyblock so an exception cannot leak the buffer.instance.exports.dealloc(ptr, bytes.length);
A concrete WAT + JS example
The module below exports a one-page linear memory, a trivial bump alloc, and a byte_len-style
function count_nonzero(ptr, len) that walks len bytes from ptr. The allocator hands out
successive offsets from a mutable global; it is deliberately minimal so the ABI is visible with nothing
hidden — contrast it with what a bump allocator does in the
linear memory management & allocators
guide.
(module
(memory (export "memory") 1) ;; one 64 KiB page, exported to JS
(global $bump (mut i32) (i32.const 1024)) ;; heap starts above a reserved low region
;; bump allocator: return current top, advance it; no free list
(func (export "alloc") (param $size i32) (result i32)
(local $p i32)
(local.set $p (global.get $bump))
(global.set $bump (i32.add (global.get $bump) (local.get $size)))
(local.get $p))
;; no-op free for the bump strategy; present so the ABI is symmetric
(func (export "dealloc") (param $ptr i32) (param $size i32))
;; count the non-zero bytes in [ptr, ptr+len) — reads exactly len bytes
(func (export "count_nonzero") (param $ptr i32) (param $len i32) (result i32)
(local $i i32) (local $acc i32)
(block $done
(loop $loop
(br_if $done (i32.ge_u (local.get $i) (local.get $len)))
(if (i32.load8_u (i32.add (local.get $ptr) (local.get $i)))
(then (local.set $acc (i32.add (local.get $acc) (i32.const 1)))))
(local.set $i (i32.add (local.get $i) (i32.const 1)))
(br $loop)))
(local.get $acc)))
const { instance } = await WebAssembly.instantiateStreaming(fetch("/strings.wasm"));
const { alloc, dealloc, count_nonzero, memory } = instance.exports;
const bytes = new TextEncoder().encode("Wasm\0ABI"); // 8 bytes, one embedded NUL
const ptr = alloc(bytes.length);
try {
new Uint8Array(memory.buffer, ptr, bytes.length).set(bytes);
const nonzero = count_nonzero(ptr, bytes.length);
console.log(nonzero); // 7 (the NUL is the only zero byte)
} finally {
dealloc(ptr, bytes.length);
}
Returning multiple values
A function with one return slot cannot hand back a (ptr, len) pair directly. There are three standard
escapes, in rough order of portability:
- Out-pointer. The caller allocates a small scratch region and passes its pointer; the function writes the result fields there and returns nothing (or a status code). This is how a struct comes back, covered in returning structs from wasm to javascript.
- Packed
i64. Twoi32values fit in one 64-bit return:(ptr << 32) | len. The host unpacks withBigIntshifts. Compact, but limited to two 32-bit fields and awkward in JavaScript because ofBigInt. - Multi-value return. The multi-value proposal — shipped in every current engine — lets a function
declare
(result i32 i32)and return both directly, no packing.wasm-bindgenandwat2wasmemit this freely.
;; multi-value: return both the pointer and the length of a result buffer
(func (export "make_result") (result i32 i32)
(i32.const 2048) ;; ptr
(i32.const 16)) ;; len — both land on the value stack and return together
const [ptr, len] = instance.exports.make_result(); // a JS array of two numbers
const view = new Uint8Array(instance.exports.memory.buffer, ptr, len);
Struct layout in linear memory
A struct is just a fixed sequence of fields at known offsets. To read or write one from JavaScript you
need the exact layout the module’s compiler chose, and that means controlling it. In Rust, default
repr(Rust) layout is unspecified — the compiler may reorder fields — so you annotate the type with
#[repr(C)] to get the predictable C ABI: fields in declaration order, each aligned to its own size,
with padding inserted to satisfy alignment.
#[repr(C)]
pub struct Particle {
pub id: u32, // offset 0, 4 bytes
pub x: f32, // offset 4, 4 bytes
pub y: f32, // offset 8, 4 bytes
pub alive: u8, // offset 12, 1 byte
// 3 bytes tail padding → size 16, alignment 4
}
On the JavaScript side you read those fields with a DataView, passing the offset of each field and
matching the endianness — Wasm linear memory is always little-endian, so pass true to every
DataView getter. The struct above decodes as:
const dv = new DataView(memory.buffer, structPtr, 16);
const particle = {
id: dv.getUint32(0, true),
x: dv.getFloat32(4, true),
y: dv.getFloat32(8, true),
alive: dv.getUint8(12) !== 0,
};
Get the offsets wrong — most often by forgetting alignment padding — and every field after the mistake is garbage. The returning structs guide shows how to derive offsets mechanically and verify them against the compiler.
Manual ABI vs wasm-bindgen-generated
Everything above is the manual ABI: you own the encode, the allocation, the copy, and the free. The
Rust ecosystem mostly hides it behind #[wasm_bindgen], which generates a JavaScript shim that does the
identical dance — encode the string, call alloc, copy, pass (ptr, len), decode the return, and
free — automatically. The tradeoff is control versus boilerplate.
| Concern | Manual ABI | wasm-bindgen-generated |
|---|---|---|
| Lines you write | Many (alloc, copy, free per call) | One annotation |
| Copy semantics | Fully under your control; zero-copy possible | Copies in and out by default |
| Type safety | None — offsets are by convention | Generated .d.ts types |
| Debuggability | Every byte is visible | Read the emitted glue to see it |
| Best for | Hot paths, custom layouts, non-Rust modules | Application code, fast iteration |
The full account of the generated glue — JsValue, serde-wasm-bindgen, closures, and how it extends
to whole objects — lives in the
wasm-bindgen deep dive. Reach for manual
ABI when a profiler shows marshaling on the hot path or when you need a layout wasm-bindgen will not
emit; reach for the generated glue everywhere else.
Toward a standard: the canonical ABI
Hand-rolled (ptr, len) conventions do not compose: two modules built by different toolchains disagree
on how a string is laid out, so they cannot call each other directly. The Component Model fixes this by
defining a canonical ABI — a single, language-neutral lowering of high-level types (strings, lists,
records, variants) into core Wasm. You describe the interface in WIT (the WebAssembly Interface Type
language), and the toolchain generates the lifting and lowering glue so a Rust component and a JavaScript
host agree on layout by specification rather than by convention.
// a WIT interface — the canonical ABI lowers these types for you
interface geometry {
record point { x: f32, y: f32 }
distance: func(a: point, b: point) -> f32
}
The canonical ABI still bottoms out in pointers, lengths, and linear memory — it is the same
machinery, standardized and code-generated — but it removes the per-project bespoke contract. Until
components are ubiquitous, the manual ABI and wasm-bindgen remain how most production code crosses the
boundary, which is why both companion guides below stay at the byte level.
Optimization flags & tradeoffs
The decisive axis is copy vs zero-copy. A manual ABI lets you allocate once and let the module write
results in place, skipping the inbound and outbound copies that wasm-bindgen performs by default. For a
4 MB buffer at ~10 GB/s, each eliminated copy saves ~0.4 ms — often more than the computation itself. The
catalogue of these patterns lives in
zero-copy data transfer patterns.
The second axis is batching. Per-call marshaling — the allocate/copy/free overhead — dominates at
small payload sizes, so one call processing 10,000 elements beats 10,000 calls processing one. Enable the
multi-value proposal (--enable-multi-value in older wat2wasm; on by default now) to return pairs
without BigInt packing, and prefer i32 lengths over i64 unless you genuinely address more than 4 GiB
of memory.
Gotchas & failure modes
- Forgetting to free → leak. A bump allocator never reclaims; even a real
malloc/freeleaks if you skip thedealloc. Wrap every alloc intry/finally. A growingmemory.buffer.byteLengthacross calls is the symptom. - Reading
lenbytes past the buffer → trap. If you pass alenlarger than the bytes you actually wrote (or a staleptr), the module’si32.loadwalks off the allocation. In bounds it reads garbage; pastmemory.byteLengthit traps withRuntimeError: memory access out of bounds. Validate every length you receive from JavaScript before using it. - Alignment / padding mismatch. Decoding a
#[repr(C)]struct with the wrong offsets — usually by ignoring tail or interior padding — silently corrupts every field after the error. Derive offsets from the type, never by eyeballing field sizes. - Stale view after
memory.grow. Any allocation that triggers a grow can detachmemory.buffer, zero-lengthing yourUint8Array/DataView. Re-create views after everyalloc— the mechanism is detailed in why memory.grow invalidates pointers. - UTF-8 vs UTF-16 length. Passing
str.length(UTF-16 units) instead of the encoded byte length truncates or overruns multibyte strings. Always passencoded.length.
Verification
Confirm the static side with wasm-objdump. The -x section dump shows your exported allocator and
functions; -s dumps the data section so you can see embedded string constants at their offsets.
# list exports — confirm alloc, dealloc, count_nonzero, memory are present
wasm-objdump -x strings.wasm | grep -A6 "Export"
# dump the data section as hex — confirm any static strings land where you expect
wasm-objdump -s -j data strings.wasm
At runtime, validate before instantiating and inspect memory in DevTools. wasm-validate strings.wasm
catches malformed multi-value signatures; the Memory inspector in Chrome DevTools lets you read the exact
bytes at a ptr to confirm your encode wrote what you expected.
In this guide
- Encoding strings across the wasm boundary —
the full UTF-8 round trip, raw vs
wasm-bindgen, and the length and freeing gotchas. - Returning structs from wasm to javascript —
out-pointers,
DataViewfield reads,#[repr(C)]offsets, and packing small structs into ani64.
Frequently Asked Questions
Why must I export an allocator from the module instead of just picking an offset in JavaScript?
The module’s compiled code owns its heap — its own alloc/free track which regions are in use. If you
write to an offset you chose yourself, you may stomp on the module’s stack, its static data, or a live
allocation. Calling the module’s exported alloc gets you a pointer the module guarantees is free.
Is the (ptr, len) length in bytes or in elements?
Whatever you and the module agree on — the boundary has no built-in notion. For strings and byte buffers
it is bytes; for a &[u32] slice it is usually the element count, with the module multiplying by 4
internally. Document the unit per function, because there is no type to enforce it.
Do I need #[repr(C)] if I only ever read the struct from another Rust function?
No — within a single Rust program the compiler is consistent. You need #[repr(C)] precisely when an
external reader (JavaScript via DataView, or another language) must know the layout, because default
repr(Rust) layout is unspecified and may change between compiler versions.
When should I prefer the multi-value return over a packed i64?
Almost always — multi-value is supported in every current engine and avoids BigInt on the JavaScript
side. Reach for the packed i64 only when targeting an old runtime without multi-value, or when an
existing ABI you must match already packs.
Does the Component Model’s canonical ABI replace wasm-bindgen?
Eventually it overlaps heavily, but today wasm-bindgen targets the browser import object model
directly while components need a host that understands the Component Model. For shipping browser code in
2026, wasm-bindgen or manual ABI is still the path; WIT and the canonical ABI are where the ecosystem is
heading.
Related
- wasm-bindgen deep dive — the generated glue that automates this ABI.
- Zero-copy data transfer patterns — skipping the copies this guide describes.
- Linear memory management & allocators — the
alloc/freebehind every pointer here. - Encoding strings across the wasm boundary — the UTF-8 string round trip in detail.
- Returning structs from wasm to javascript — reading struct fields with DataView.
← Back to JS/Wasm Interop & Memory Management