Migrating legacy C code to WebAssembly
This guide answers one concrete question: how do you take an existing C codebase that assumes a real
operating system and get it compiling, linking, and running correctly as a WebAssembly module under
Emscripten — without rewriting it from scratch? Legacy C code typically assumes unrestricted OS
access, raw file descriptors, and a contiguous virtual address space. WebAssembly gives you none of
those: a sandboxed runtime, a virtual filesystem, and a single bounds-checked linear memory buffer.
The work is mechanical once you know which failure maps to which fix.
The migration breaks into three distinct problem classes, and it helps to keep them separate.
Compile-time failures are about missing headers and unsupported architecture flags — the easiest to
fix because the compiler points at the exact line. Link-time failures are about undefined symbols and
archive ordering, where wasm-ld cannot resolve a reference to a syscall that has no Wasm
implementation. Runtime failures are the subtle ones: a build that compiles and links cleanly but
traps with memory access out of bounds, hangs the page on a blocking call, or silently corrupts its
own heap. The procedure below walks the three classes in order, because fixing them out of order wastes
time — there is no point chasing a runtime trap in code that has not linked yet.
Prerequisites
- [ ] emsdk activated —
./emsdk activate latest && source ./emsdk_env.sh, withemcc --version≥ 3.1.50 pinned in CI so behavior is reproducible. - [ ] The code builds natively first — confirm a clean
gcc -Wallbuild before porting, so you are debugging Wasm-specific issues, not pre-existing bugs. - [ ] A list of the project’s syscalls — run
nmorgrepforfork,socket,mmap,pthread_create, and rawopen/readso you know your surface area up front. - [ ] Node ≥ 16 to run the output headlessly while iterating.
Procedure
-
Establish a strict baseline build. Compile the entry translation unit with
-sSTRICT=1so deprecated, non-standard extensions fail loudly instead of compiling to something subtly wrong.emcc legacy.c -sSTRICT=1 -sENVIRONMENT=web,worker -o baseline.mjs -sEXPORT_ES6=1 -
Surface every undefined symbol at once. Force the linker to report all unresolved references so you can triage them as a batch rather than one rebuild at a time.
emcc legacy.c -sERROR_ON_UNDEFINED_SYMBOLS=1 -o test.mjs 2>&1 | grep "undefined symbol" -
Isolate unsupported POSIX calls behind a compile guard. Wrap process, socket, and raw-fd usage so the Wasm build takes a stubbed or async-JS path while the native build is untouched.
#ifdef __EMSCRIPTEN__ #include <emscripten.h> /* route to an async JS shim or a no-op stub */ #else #include <unistd.h> #endif -
Map file access onto the virtual filesystem. Legacy code that opens files needs those files present in Emscripten’s in-memory FS. Preload them at build time, or mount a persistent backend.
emcc legacy.c -o app.mjs --preload-file assets/ -sEXPORT_ES6=1 # or, at runtime, persist to IndexedDB: # Module.FS.mount(Module.IDBFS, {}, '/data'); -
Replace any custom heap manager with Emscripten’s allocator. Allocators that call
brk/sbrkdirectly or assume a fixed address space corruptlinear memory. Defer to the built-indlmalloc(oremmallocfor size).emcc legacy.c -sMALLOC=dlmalloc -sINITIAL_MEMORY=67108864 -o app.mjsThe reason this matters is that legacy allocators frequently assume
sbrkhands back an ever-growing flat address space. Under WebAssembly the heap is a bounded region of onelinear memorybuffer, and a custom allocator that walks past it does not segfault into a clean crash — it scribbles over the stack or static data living in the same buffer. Routing allocation through Emscripten’sdlmallockeeps everymalloc/freeinside the runtime’s managed region. -
Fix link-time ordering and emit a modern module. Place objects before the archives they consume, then produce ESM output for the front end.
wasm-ld, like the GNU linker, resolves symbols left-to-right, so an archive listed before the object that needs it contributes nothing.emcc main.o -L./libs -llegacy -sMODULARIZE=1 -sEXPORT_ES6=1 -o app.mjs
Expected output and verification
A successful build emits app.mjs and app.wasm. Validate the binary and confirm the runtime starts
without aborting:
wasm-validate app.wasm # exits 0 on a well-formed module
node -e "import('./app.mjs').then(f => f.default()).then(() => console.log('runtime ok'))"
Inspect the import section to confirm you have not pulled in an unexpected syscall:
wasm-objdump -x app.wasm | grep -i import
Seeing only the expected env/wasi_snapshot_preview1 imports means your shims caught everything; a
stray fd_write or environment import you did not anticipate points to an unhandled code path.
For a deeper smoke test, run the module’s main computation against a known input and compare the output
to the native build byte-for-byte. Differences usually trace to one of three sources: undefined
behavior that the native compiler happened to tolerate but the Wasm back end did not, integer-size
assumptions (long is 32-bit under wasm32, not 64-bit as on a typical 64-bit host), or endianness code
that is now dead because WebAssembly is always little-endian. Catching these at verification time, with
a reference output in hand, is far cheaper than diagnosing them later from a wrong result in the
browser.
Gotchas
error: implicit declaration of function 'foo'. A POSIX header your code includes does not exist
in the Emscripten sysroot, so the function is implicitly declared and then fails to link. Guard the
include with #ifdef __EMSCRIPTEN__ and provide a stub, rather than forcing the header.
wasm-ld: error: undefined symbol: pthread_create. Threading is off by default. Either remove the
threading path under the Emscripten guard, or opt in with -pthread -sPTHREAD_POOL_SIZE=4 — which
additionally requires cross-origin isolation (COOP/COEP) in the browser to enable SharedArrayBuffer.
RuntimeError: memory access out of bounds at runtime. A pointer escaped its allocation, often a
legacy allocator writing past the heap. Rebuild with -sSAFE_HEAP=1 -sASSERTIONS=2; it adds guard
checks that report the exact offending access with a stack trace instead of corrupting memory
silently.
Blocking I/O hangs the page. Synchronous sleep, fread, or recv cannot block the browser
event loop. Wrap the call graph with Asyncify so the C code keeps its synchronous shape while the
runtime yields to JavaScript:
emcc legacy.c -sASYNCIFY=1 -sASYNCIFY_IMPORTS='["js_sleep"]' -o app.mjs
undefined symbol: __cxa_throw or longjmp aborts. C++ exceptions and setjmp/longjmp are
opt-in in modern Emscripten because the zero-cost default has neither. Enable them explicitly with
-fexceptions (or -fwasm-exceptions for the newer, faster scheme on supporting runtimes) and
-sSUPPORT_LONGJMP=emscripten. Each adds binary size and some overhead, so enable only the one your
code actually uses:
emcc legacy.c -fexceptions -sSUPPORT_LONGJMP=emscripten -o app.mjs
Performance note
Asyncify is convenient but not free: it instruments the call stack to support unwinding and rewinding,
which inflates the binary 30–50% and adds per-call overhead on every function in the async path. Scope
-sASYNCIFY_IMPORTS to the narrowest possible set of functions so the transform only touches the call
chains that actually suspend. For the dead-code and Binaryen size passes that claw some of that weight
back, see reducing Wasm bundle size with wasm-opt.
The other latency to budget for is instantiation. A large module compiled to a multi-megabyte binary
takes measurable time to download and compile; streaming instantiation overlaps the two by compiling
the bytes as they arrive, which is why serving the .wasm with Content-Type: application/wasm and
using WebAssembly.instantiateStreaming matters for a migrated codebase. Profile the cold start in the
DevTools Performance panel and the peak working set via Module.HEAPU8.length; legacy code that
front-loads a large static reservation often reserves far more linear memory than it ever touches,
and trimming -sINITIAL_MEMORY to the real high-water mark both shrinks the reservation and speeds the
first paint.
Frequently Asked Questions
Do I have to rewrite my custom allocator?
Usually no — you replace it. Build with -sMALLOC=dlmalloc so calls to malloc/free route through
Emscripten’s allocator instead of your sbrk-based one. Keep your allocator only if it provides
domain-specific behavior, and then make it respect the linear memory model rather than assuming a
flat address space.
How do I handle a fork()/exec() model?
There is no process model in WebAssembly. Refactor the design to use Web Workers for parallelism (each
worker is a separate module instance) and message passing instead of shared process state. There is no
mechanical shim — fork is the one call you must restructure around.
Can I keep my Makefile?
Yes. Wrap it with emmake make, which overrides CC/CXX to point at emcc/em++ and injects the
Emscripten sysroot. For CMake projects, configure with emcmake cmake -B build instead. Avoid
hardcoding -march, -mtune, or host-specific flags in the build file, since the Wasm back end
rejects them — gate those behind a toolchain check or strip them in the Emscripten configuration.
Why does my ported code give the wrong answer instead of crashing?
The usual culprit is undefined behavior the native compiler tolerated and the Wasm back end did not, or
an integer-width assumption: long is 32-bit under the wasm32 target, not 64-bit as on most 64-bit
hosts. Audit for code that stores pointers in long, or that relies on int and long being the same
size, and switch to fixed-width types from <stdint.h>.
Related
- Binding C++ libraries with embind — exposing the ported code’s API to JavaScript cleanly.
- ESM bindings & module generation — packaging the
MODULARIZEfactory for a bundler. - JS/Wasm Interop & Memory Management — the pointer/length ABI behind every
_malloccall.
← Back to C/C++ to Wasm with Emscripten