English

Physics Series Part 1: Cross-Platform Compilation of PhysX Based on WebAssembly and PVD Joint Debugging

technical

Introduction

Galacean Engine will introduce a physics component in milestone 0.6. After research, we chose the most powerful open-source engine in the industry, PhysX. This series is the first article in the PhysX physics series, focusing on WebAssembly compilation and PhysX Visual Debugger joint debugging. We hope that through this explanation, developers can get started with WebAssembly compilation and more conveniently perform secondary development based on our provided PhysX.js repository, adding physical simulation functions to web applications.

After browsers supported WebAssembly, the traditional mode of writing front-end code only in JavaScript was broken. A lot of high-performance code developed in C++ and Rust can be compiled into .wasm files and run cross-platform in browsers, significantly increasing the execution speed of front-end code. Initially, the main compilation schemes were based on Emscripten(EMSDK). With the development of the WebAssembly concept, WebAssembly System Interface (WASI) emerged. This draft proposes that binary files compiled into .wasm can be executed on any platform, not just browsers, by implementing external interfaces.

Therefore, we initially planned to use WASI-SDK to compile PhysX. The advantage of this approach is that after compilation, there will only be one .wasm file without heavy glue files, allowing for maximum size compression. The compilation process is no different from traditional C++ compilation, just calling the SDK-provided clang++ and wasm-ld for compilation and linking:

typedef int pointer_t;
#define WASM_EXP __attribute__((visibility("default")))
 
pointer_t WASM_EXP PxTransform_create() {
	return (pointer_t) new physx::PxTransform();
}

However, we later discovered that PhysX has a multi-threaded architecture based on Pthread, which is not supported by WASI. But Pthread is supported in WebAssembly, mainly through WebWorker and ShareArrayBuffer, which are not yet part of the WASI standard.

Therefore, we had to revert to the classic solution, using Emscripten. Fortunately, in newer versions after 1.39.0, Emscripten uses the new LLVM backend upstream:

Fastcomp and upstream use very different LLVM and clang versions (fastcomp has been stuck on LLVM 6, upstream is many releases after). This affects optimizations, usually by making the upstream version faster and smaller.

因此，我们可以基于这一新版本的 SDK 来编译出更小更优的 .wasm 二进制文件。本文将介绍具体的编译细节，有兴趣的读者欢迎关注我们的 GitHub 仓库 PhysX.js，我们会为其添加更多新的功能，包括但不限于在 PhysX 4.x 中被分离成单独工具包的的布料模拟SDK：NvCloth。如果您在编译过程中遇到其他问题，欢迎提出相应的Issue，我们会持续跟进 WebAssembly 的相关进展，优化 PhysX 的编译效果。

使用Embind进行编译

Emscripten 工具链（下称 EMSDK ）围绕传统的跨平台 C++ 项目提供了名为 Embind 的工具。使用他来进行编译只需要三步：

在跨平台项目中，构建系统普遍使用的是 Make 和 CMake，而使用 EMSDK 则只需在原来的编译命令之前加上 em 即可：

emcmake cmake
emmake make

EMSDK 将自动调用 emcc 和 em++ 编译器完成编译静态库的工作。这些静态库后续将用于连接并生成 .wasm 二进制文件。

接着，最关键的是导出所需要的 C++ 接口，这时需要根据Embind提供的一个脚手架模板，编写C++代码PxWebBindings.cpp，例如：

function("PxCreateFoundation", &PxCreateFoundation, allow_raw_pointers());
function("PxCreatePhysics", &PxCreateBasePhysics, allow_raw_pointers());
function("PxCreatePlane", &PxCreatePlane, allow_raw_pointers());
 
value_object<PxVec3>("PxVec3")
        .field("x", &PxVec3::x)
        .field("y", &PxVec3::y)
        .field("z", &PxVec3::z);
 
enum_<PxForceMode::Enum>("PxForceMode")
        .value("eFORCE", PxForceMode::Enum::eFORCE)
        .value("eIMPULSE", PxForceMode::Enum::eIMPULSE)
        .value("eVELOCITY_CHANGE", PxForceMode::Enum::eVELOCITY_CHANGE)
        .value("eACCELERATION", PxForceMode::Enum::eACCELERATION);
 
class_<PxScene>("PxScene")
        .function("setGravity", &PxScene::setGravity)
        .function("getGravity", &PxScene::getGravity)
        .function("addActor", &PxScene::addActor, allow_raw_pointers())
        .function("removeActor", &PxScene::removeActor, allow_raw_pointers())
        .function("raycastSingle", optional_override(
            [](const PxScene &scene, const PxVec3 &origin, const PxVec3 &unitDir, const PxReal distance,
               PxRaycastHit &hit, const PxSceneQueryFilterData &filterData) {
                return PxSceneQueryExt::raycastSingle(scene, origin, unitDir, distance,
                                                      PxHitFlags(PxHitFlag::eDEFAULT), hit, filterData);
            }));

无论是值类型，枚举，函数，类，都可以类似上述代码中写法来导出。也可以利用 optinal_override 给类型添加新的方法。更多的用法请参考 Embind 的文档。

最后，有了这么一个文件来描述导出的函数，就可以使用 em++ 对其进行编译，编译时需要链接刚刚编译出来的静态库，因为 C++ 当中头文件(.h)指定函数签名，实现文件(.cpp)实现最终的函数，编译后头文件用于其他程序调用，生成二进制的静态库记录函数实现。可执行程序编译后，最终需要链接静态库，才能被执行。但与普通的C++程序不同，编译器最终会生成 .wasm 二进制文件以及 JavaScript 胶水文件以方便加载 .wasm 二进制文件。

在PhysX.js当中，方便起见，对于 PxWebBindings.cpp 的编译，我们统一使用了cmake 来进行管理依赖，编译参数写在 PhysXWebBindings.cmake 当中，并且提供了方便的 build.sh 脚本一键编译整个项目。

异步加载.wasm文件

EMSDK 给我们提供了一个并不小的 JavaScript 胶水文件，但同时提供了非常方便的加载逻辑。我们可以很简单地调用：

PHYSX().then(function (PHYSX) {
  _cb(PHYSX);
});

其中 PHYSX 这个名字是由编译参数制定的：

SET(EMSCRIPTEN_BASE_OPTIONS "--bind -s EXPORT_ES6=1 -s MODULARIZE=1 -s EXPORT_NAME=PHYSX -s ALLOW_MEMORY_GROWTH=1")

所有调用 PhysX 的逻辑全部都写在回调函数当中。后续对于异步加载 .wasm 文件，Galacean Engine 的 0.6 里程碑会提供统一的通用性设计，后续文章会进行讲解。

不同编译目标的对比

根据编译目标的不同，EMSDK 会有四种编译结果：Release，Profile，Checked，Debug，分别对应不同大小的 .wasm 二进制文件和 JavaScript 胶水文件。在 -O3 的优化参数下，胶水文件的缩进空格被取消，使得体积被压缩的尽可能小。

| | .wasm binary file | JavaScript glue file | | --- | --- | --- | | Debug(-g) | 54.2MB | 241K | | Release(-O3) | 2.6MB | 161K |

Using the same method to compile Bullet, comparing the results compiled by the industry using the WASI toolchain:

	.wasm binary file	JavaScript glue file
EmBind (the solution we use)	457K	55K
WASI-SDK	483K	0

The glue file provided by the EMSDK solution does not contain any encapsulation content for the PhysX code. Therefore, as the number of API exports increases, if system-level APIs such as Socket are not involved, the size of the glue file will basically not change, only the size of the .wasm file will increase. It can be seen that the current solution is the best choice for both ease of use and package size.

PhysX Visual Debugger (PVD) Connection and Debugging

NVIDIA provides a debugging tool called PhysX Visual Debugger, which listens to a TCP port to obtain data from the physical scene, record and display the movement details of objects in it, so as to find the bottleneck of physical simulation in the scene and optimize it.

In this section, we will focus on this feature, demonstrating how to modify PxWebBindings.cpp, recompile, and conduct research through JavaScript code. By following the operations in this section, readers can add or remove functions included in the .wasm file by themselves.

Step 1: Research the usage of PVD in PhysX

From the PhysX Snippets, it can be seen that to use PVD, you need to pass in the PVD object when initializing PxPhysics:

PxPvd* gPvd = PxCreatePvd(*gFoundation);
PxPvdTransport* transport = PxDefaultPvdSocketTransportCreate(PVD_HOST, 5425, 10);
gPvd->connect(*transport,PxPvdInstrumentationFlag::eALL);
 
gPhysics = PxCreatePhysics(PX_PHYSICS_VERSION, *gFoundation, PxTolerancesScale(),true,gPvd);

Step 2: Initial solution: Directly add methods to PxWebBindings.cpp

In this step, we directly write some required types and methods into PxWebBindings.cpp, for example:

function("PxCreatePvd", &PxCreatePvd, allow_raw_pointers());
function("PxDefaultPvdSocketTransportCreate", optional_override(
        []() {
            return PxDefaultPvdSocketTransportCreate("127.0.0.1", 5426, 10);
        }), allow_raw_pointers());
 
class_<PxPvdInstrumentationFlags>("PxPvdInstrumentationFlags").constructor<int>();
enum_<PxPvdInstrumentationFlag::Enum>("PxPvdInstrumentationFlag")
        .value("eALL", PxPvdInstrumentationFlag::Enum::eALL)
        .value("eDEBUG", PxPvdInstrumentationFlag::Enum::eDEBUG)
        .value("ePROFILE", PxPvdInstrumentationFlag::Enum::ePROFILE)
        .value("eMEMORY", PxPvdInstrumentationFlag::Enum::eMEMORY);
 
class_<PxPvd>("PxPvd")
        .function("connect", &PxPvd::connect);
 
class_<PxPvdTransport>("PxPvdTransport");

It should be noted here that PVD listens on port 5425 by default, and after being compiled by WebAssembly, all Socket functions will be converted to WebSocket functions. Therefore, to avoid port 5425 being occupied, another port number was selected. After compiling to get the .wasm, it was also found that the JavaScript glue file expanded nearly twice, originally only 4000+ lines, after compilation it became 8000+. The main reason is that a series of WebSocket methods, such as connect, close, etc., will be written in the glue file.

However, after running, errors will be found. The main problem occurs in the select function in the JavaScript glue file. Select is a non-blocking function in Socket communication, but the compiled glue file does not support the complete function. It can be seen in the following code. The except file descriptor exceptfds must be null, otherwise an error will be reported.

function ___sys__newselect(nfds, readfds, writefds, exceptfds, timeout) {try {
    // readfds are supported,
    // writefds checks socket open status
    // exceptfds not supported
    // timeout is always 0 - fully async
    assert(nfds <= 64, 'nfds must be less than or equal to 64');  // fd sets have 64 bits // TODO: this could be 1024 based on current musl headers
    assert(!exceptfds, 'exceptfds not supported');
    ...
}

But from the C++ source code of PxDefaultPvdSocketTransportCreate, it can be seen that this method uses this descriptor, so the compiled code cannot run.

// Setup select function call to monitor the connect call.
fd_set writefs;
fd_set exceptfs;
FD_ZERO(&writefs);
FD_ZERO(&exceptfs);
FD_SET(mSocket, &writefs);
FD_SET(mSocket, &exceptfs);
timeval timeout_;
timeout_.tv_sec = timeout / 1000;
timeout_.tv_usec = (timeout % 1000) * 1000;
int selret = ::select(mSocket + 1, NULL, &writefs, &exceptfs, &timeout_);
int excepted = FD_ISSET(mSocket, &exceptfs);
int canWrite = FD_ISSET(mSocket, &writefs);
if (selret != 1 || excepted || !canWrite) {
  disconnect();
  return false;
}

In addition, even if all exceptfs in the source code are removed (exceptfs itself is an optional parameter), the console will still show the error "WebSocket is closed before the connection is established", the WebSocket is closed prematurely and cannot maintain the connection. Therefore, directly adding methods to PxWebBindings.cpp and compiling in the default way will cause many problems in the size and function of the compiled product. This makes it necessary for us to understand the internal details of PhysX and find new solutions.

Step 3: New solution: Write a callback class for PxPvdTransport

In the PhysX code, it can be seen that PxPvdTransport is a pure virtual base class that defines a series of interfaces. The function PxDefaultPvdSocketTransportCreate constructs PvdDefaultSocketTransport, which is just one of its implementations. Therefore, we can manually construct a callback class with PxPvdTransport as the base class.

To avoid directly calling socket functions, one idea is to let "C++ call JavaScript". That is, create a WebSocket connection in the JavaScript code and send the data out through the WebSocket. Then forward the WebSocket port to the TCP port to achieve PVD data reception.

To enable "C++ calling JavaScript," Embind provides a convenient way. First, wrap the abstract base class in PxWebBindings.cpp and specify the corresponding JavaScript function interface:

struct PxPvdTransportWrapper : public wrapper<PxPvdTransport> {
    EMSCRIPTEN_WRAPPER(PxPvdTransportWrapper)
 
    void unlock() override {}
 
    void flush() override {}
 
    void release() override {}
 
    PxPvdTransport &lock() override { return *this; }
 
    uint64_t getWrittenDataSize() override { return 0; }
 
    bool connect() override { return call<bool>("connect"); }
 
    void disconnect() override { call<void>("disconnect"); }
 
    bool isConnected() override { return call<bool>("isConnected"); }
 
    bool write(const uint8_t *inBytes, uint32_t inLength) override {
        return call<bool>("write", int(inBytes), int(inLength));
    }
};
 
class_<PxPvdTransport>("PxPvdTransport")
        .allow_subclass<PxPvdTransportWrapper>("PxPvdTransportWrapper", constructor<>());

With the help of the template scaffold, the wrapper allows C++ to call the callback functions we implement later in JavaScript and send data through WebSocket using the write method. Additionally, we can see that the JavaScript glue file compiled in this way no longer contains functions like connect, and the code is around 4000+ lines, similar to the original size.

Step 4: Implement Callback Functions in JavaScript

The above code requires us to implement the connect, disconnect, isConnected, and write functions in JavaScript. Therefore, we can write the following code:

const pvdTransport = PhysX.PxPvdTransport.implement({
    connect: function () {
        socket = new WebSocket('ws://127.0.0.1:5426', ['binary'])
        socket.onopen = () => {
            console.log('Connected to PhysX Debugger');
            queue.forEach(data => socket.send(data));
            queue = []
        }
        socket.onclose = () => {
        }
        return true
    },
    disconnect: function () {
        console.log("Socket disconnect")
    },
    isConnected: function () {
    },
    write: function (inBytes, inLength) {
        const data = PhysX.HEAPU8.slice(inBytes, inBytes + inLength)
        if (socket.readyState === WebSocket.OPEN) {
            if (queue.length) {
                queue.forEach(data => socket.send(data));
                queue.length = 0;
            }
            socket.send(data);
        } else {
            queue.push(data);
        }
        return true;
    }
})
 
const gPvd = PhysX.PxCreatePvd(foundation);
gPvd.connect(pvdTransport, new PhysX.PxPvdInstrumentationFlags(PhysX.PxPvdInstrumentationFlag.eALL.value));
 
physics = PhysX.PxCreatePhysics(
    version,
    foundation,
    new PhysX.PxTolerancesScale(),
    true,
    gPvd
)

We can see that our callback functions are only thirty lines, much less than the nearly 4000+ lines of code generated by directly exporting the code.

Step 5: Implement Joint Debugging

The final step in implementing joint debugging is to forward the WebSocket to the TCP port of the operating system. We used websockify-js, which is also one of the tools mentioned by EmScripten. Since PVD can only be installed on Windows, we need to install the Windows version of Node and run it (it cannot be executed in Windows Subsystem Linux (WSL)):

node .\websockify.js 127.0.0.1:5426 127.0.0.1:5425

Step 6: Final Optimization

Through the above process, we can see how to start from the API of the official PhysX example and gradually choose the compilation scheme according to the requirements, minimizing the size of the .wasm file and JavaScript glue file while ensuring functionality. We noticed that sometimes introducing a function causes the JavaScript glue file to double in size. In fact, for different compilation targets, cmake sets different compilation parameters:

SET(PHYSX_EMSCRIPTEN_DEBUG_COMPILE_DEFS   "NDEBUG;PX_DEBUG=1;PX_CHECKED=1;${NVTX_FLAG};PX_SUPPORT_PVD=1"  CACHE INTERNAL "Debug PhysX preprocessor definitions")
SET(PHYSX_EMSCRIPTEN_CHECKED_COMPILE_DEFS "NDEBUG;PX_CHECKED=1;${NVTX_FLAG};PX_SUPPORT_PVD=1" CACHE INTERNAL "Checked PhysX preprocessor definitions")
SET(PHYSX_EMSCRIPTEN_PROFILE_COMPILE_DEFS "NDEBUG;PX_PROFILE=1;${NVTX_FLAG};PX_SUPPORT_PVD=1"  CACHE INTERNAL "Profile PhysX preprocessor definitions")
SET(PHYSX_EMSCRIPTEN_RELEASE_COMPILE_DEFS "NDEBUG;PX_SUPPORT_PVD=0" CACHE INTERNAL "Release PhysX preprocessor definitions")

That is to say, for the Release version, even if the above PVD functions are compiled, no data will be sent when called. Therefore, we can put all PVD-related functions in a specific macro environment and not compile them in the Release version, thereby minimizing the size of the compiled file:

#if PX_DEBUG || PX_PROFILE || PX_CHECKED
...
#endif

For subsequent methods added, corresponding macros can be configured to compile only the required interfaces, thereby compressing the size of the compiled file as much as possible.

PhysX Architecture and Summary

The above two sections introduced how to choose the appropriate compilation scheme to export and compile PhysX functions into a .wasm file. The overall compilation scheme is very simple, but this simplicity comes from the design of the PhysX architecture. For example, types like PxPvdTransport and PxActor are abstract base classes, so specific methods can be implemented in JavaScript using similar methods to extend their functionality. During the compilation process, if system functions are involved, such as the Socket mentioned in this article, consider that introducing the code for these functions may cause a significant increase in the size of the compiled file.

In the future, based on this article, we will also introduce how to design the asynchronous loading logic of the engine, build dependencies between components, and design and implement physical components. Stay tuned.

Resource

Engine Editor Toolkit Spine Integration Lottie Integration

Documentation

Getting Started API Reference Examples

Our Team

About Blog Changelog Conference

Legal

Support

Contact Cooperation