English

InputManager Design and Implementation

technical

Introduction

Interactive input is a very important function in the engine's functional layer. It allows users to interact with the application using devices, touch, or gestures. In the 0.6 milestone, we initially built the interaction system of the Galacean Engine, which currently supports click and keyboard. This article will share the thoughts and shortcomings during the development process.

Overall Design

Main Architecture

Input devices, touch, XR devices, etc., all belong to the input of the interaction system. We gather all the logic of the input in the InputManager, and subdivide it into specific inputs such as PointerManager and KeyBoardManager according to various types of input. The InputManager manages all specific input managers. In the frame processing of interaction, it only needs to process the logic of specific inputs within each manager.

API Design

Frame Lifecycle

The lifecycle of frame processing is as follows:

The internal lifecycle of InputManager is as follows:

How to Use

Pointer

Add a collider to the object with a collision volume in the three-dimensional space.
Refer to the trigger conditions of the callback interface in the Script component to add appropriate logic. | Interface | Trigger Timing and Frequency | | --- | --- | | onPointerEnter | Triggered once when the touch point enters the collision volume of the Entity | | onPointerExit | Triggered once when the touch point leaves the collision volume of the Entity | | onPointerDown | Triggered once when the touch point is pressed within the collision volume of the Entity | | onPointerUp | Triggered once when the touch point is released within the collision volume of the Entity | | onPointerClick | Triggered once when the touch point is pressed and released within the collision volume of the Entity | | onPointerDrag | Continuously triggered when the touch point is pressed within the collision volume of the Entity until the touch point is no longer pressed |

KeyBoard

Directly call the methods provided by the InputManager to determine the key status.

Method Name	Method Explanation
isKeyHeldDown	Returns whether the key is being held down continuously
isKeyDown	Returns whether the key was pressed in the current frame
isKeyUp	Returns whether the key was released in the current frame

Mouse and Touch

Background

PointerEvent is the trend of subsequent development of mouse and touch interaction in browsers. Pointer is a hardware layer abstraction of input devices. Developers do not need to care whether the data source is a mouse, touchpad, or touchscreen. However, it also has some compatibility issues. As can be seen in canIUse, the device coverage rate of PointerEvent is 92.82%, which needs to be solved by importing Polyfill.

Requirement Research

Add hook functions that respond to Pointer in the script component. For entities with collision volume in three-dimensional space, developers can easily implement interactions such as click, drag, and select by supplementing the logic in the corresponding hook functions.

Hook Function	Trigger Timing and Frequency
onPointerEnter	Triggered once when the touch point enters the Entity's collider range
onPointerExit	Triggered once when the touch point leaves the Entity's collider range
onPointerDown	Triggered once when the touch point is pressed within the Entity's collider range
onPointerUp	Triggered once when the touch point is released within the Entity's collider range
onPointerClick	Triggered once when the touch point is pressed and released within the Entity's collider range
onPointerDrag	Continuously triggered when the touch point is pressed within the Entity's collider range until the touch point is no longer pressed

Native Events

Similar to MouseEvent and TouchEvent, PointerEvent can also be captured by listening. canvas.addEventListener('pointerXXX', callBack);

	MouseEvent	TouchEvent	PointerEvent
Press	mousedown	touchstart	pointerdown
Release	mouseup	touchend	pointerup
Move	mousemove	touchmove	pointermove
Leave	mouseout \| mouseleave	touchend \| touchcancel	pointerout \| pointercancel \| pointerleave

Flowchart

The general process of Pointer handling can be summarized, where the green boxes represent native events.

Raycasting

The biggest problem to solve in Pointer is how to perform raycasting in three-dimensional space based on the position information from native events. This part not only includes basic knowledge of spatial transformation but also the basic use of the physics system.

After capturing the PointerEvent, we need to:

Obtain valid screen position information from the native event.
Convert the position from screen space to three-dimensional space and get the detection ray.
Perform intersection detection between the ray and the collider.
Callback the script.

Screen Position Information

We aim to get the position of the pointer relative to the target element, but there are many coordinate properties in native events, so we need to identify which coordinate information is valid.

Native Event Coordinate Properties	Property Explanation
clientX & clientY	Coordinates relative to the application area that triggered the event (viewport coordinates)
offsetX & offsetY	Coordinates relative to the target element
pageX & pageY	Coordinates relative to the entire Document (including scroll areas)
screenX & screenY	Coordinates relative to the top-left corner of the main display (rarely used)
x & y	Same as clientX & clientY

They have the following conversion relationships (assuming the native event is event and the clicked target element is canvas):

The conclusion is: most coordinate properties can provide the desired coordinate information, with offset being the most direct and convenient.

Spatial Transformation

Simplify raycasting by obtaining a ray in three-dimensional space from the coordinates clicked on the screen and then performing collision detection with colliders in three-dimensional space.

Taking a perspective camera as an example, after obtaining the coordinates clicked on the screen, the following steps are needed to get the ray:

offset -> Screen Space
Screen Space -> Clip Space
Clip Space -> World Space

Those familiar with graphics engines know that during rendering, we undergo the following transformations:

Model Space -> World Space
World Space -> View Space -> Clip Space
Clip Space -> Screen Space

It seems that we only need to get the screen space coordinates and then perform the inverse transformations of several space transformations.

offset -> Clip Space

It is necessary to have a general understanding of pixels (pixel), device-independent pixels (dips), and device pixel ratio (devicePixelRatio). The coordinate information obtained from the offset property in the click event is in device-independent pixels, so when calculating screen space coordinates, ensure that the units of the numerator and denominator are consistent.

裁剪空间是 XYZ 范围皆在 -1 到 1 的左手坐标系（裁剪空间可以形象地理解为当渲染范围超出这个区间就会被裁减），此处转换时需注意：

求解触摸点在屏幕空间的相对位置时要注意分子与父母应都为像素或都为设备独立像素。
裁剪空间 Y 轴方向向上，offset 参考坐标系 Y 轴方向向下，因此 Y 轴需翻转。
裁剪空间中 depth 离观察者越远值越大，简单来说近平面是 -1 远平面是 1 。

屏幕空间的点 -> 世界空间的射线

公示推导中矩阵为列为主序。

以透视相机为例，世界空间经过 View 变换和 Project 变换即可转换到裁剪空间，那么从裁剪空间转换到世界空间只需要经历这些变换的逆即可。

检测射线

上式中代入近平面深度与远平面深度依次求得触摸点在世界坐标空间下近平面与远平面的投影点，连接这两个点即可得到检测射线。

射线相交检测

碰撞体由规则几何体组成（长方体，球体等）可以查阅相关射线与几何体相交算法。

脚本回调

当物理引擎返回命中的碰撞体后，可以认为它的 Entity 这就是当前帧的所有onPointerXXX回调的当事人了，在这个环节只需要根据收集的原生事件进行脚本回调即可。

性能优化

**压流：**捕获 PointerEvent 后将原生事件压入数组，待执行到交互系统的 tick 时，再按序处理相应逻辑。
**Pointer 合并：**射线检测的性能损耗较大，所以在屏幕上有多个触控点时，我们会按照一定规则合并这几个触控点，因此在触控交互逻辑中每帧的射线检测至多只会执行一次。
**多相机场景：**当出现多相机时，会依次检查渲染范围包含了点击点的所有相机，并根据相机的渲染顺序进行排序（后渲染优先），如果当前比较的相机渲染场景内没有命中碰撞体且相机的背景透明，点击事件会继续传递至上一个渲染的相机，直至命中或遍历完所有相机。

注意事项

正如开篇提到的兼容性问题，如果你的项目可能运行在低系统版本的机器中，可以导入我们定制的 PointerPolyFill 。 https://github.com/galacean/polyfill-pointer-event

键盘输入

需求调研

获取当前帧所有按下过的按键
获取当前帧所有松开过的按键
获取当前还按着的按键
判断某个按键在当前帧是否按下过
判断某个按键在当前帧是否松开过
判断某个按键现在还按着

原生事件

KeyBoardEvent 可以通过监听捕获。 canvas.addEventListener('keyXXX', callBack);

事件	触发时机
keypress	字符键按下时触发
keydown	任意键按下时触发
keyup	任意键抬起时触发

Flowchart

The general process of keyboard handling can be summarized, where the green boxes represent native events.

Selection of Index Value

Regardless of different case states or keyboard layouts, a key is an enumerable value. If the key value can be stored as an enumeration, it will bring great convenience in terms of both performance and usage. Therefore, it is necessary to determine the appropriate property to use as the enumeration value.

The following are properties in KeyEvent that can be used as enumeration values:

Property	Description	Simple Example	Compatibility
code	The physical key that triggered the event, layout-independent	Regardless of case or layout, when you press the Y key, the return is always the physical key "KeyY"	Compatible
key	The key value that triggered the event	"y" when lowercase, "Y" when uppercase	Compatible
charCode	Deprecated
keyCode	Deprecated
char	Deprecated

It can be found that the most suitable property is code. Refer to https://w3c.github.io/uievents-code/.

Performance Optimization

The interaction logic of each frame's key press is relatively simple. Maintaining three arrays for pressed, released, and held keys can meet all needs. The focus is on how to reduce the performance loss of frame-level add, delete, and search operations.

Optimize adding elements and resetting array length by using unordered arrays
Optimize search by adding an index

Unordered Arrays

Unordered arrays reduce performance loss when adding and deleting elements in most cases. The following diagram shows the composition of unordered arrays:

The following diagram shows how unordered arrays reduce performance loss:

Storage and Indexing

If only three unordered arrays are used, it still requires traversing the array to check the state of a specific key, which can cause significant performance loss in extreme cases. If the current key enumeration is used as the Key and whether it was pressed in the current frame is recorded as the Value, traversal can be avoided.

Although this implementation makes querying faster, it adds maintenance costs — the state of the mapping table needs to be reset at the beginning of each frame. However, if the frame number is saved, this cost can be perfectly avoided by updating the frame number at the beginning of each frame.

Similarly, when recording keys that are held down, an additional table is added to map the key to its index in the unordered array.

Quick Start

Key State	isKeyHeldDown	isKeyDown	isKeyUp
The key has been held down since the previous frame	true	false	false
The key was pressed in the current frame and not released	true	true	false
The key was released and pressed again in the current frame	true	true	true
The key was pressed and released in the current frame	false	true	true
The key was released in the current frame	false	false	true
The key was not pressed and had no interaction	false	false	false
This situation will not occur	true	false	true
This situation will not occur	false	true	false

Notes

When a key is held down for a period of time, the native keydown event will be continuously triggered. We have considered and filtered this situation, so developers do not need to do any extra processing.
The native event behavior of some state keys may be peculiar, and the behavior of triggering events may even be inconsistent between FireFox and Chrome (e.g., Caps Lock).

It looks like you haven't pasted the Markdown content yet. Please provide the content you want translated, and I'll help you with the translation while adhering to the rules you've specified.

Resource

Engine Editor Toolkit Spine Integration Lottie Integration

Documentation

Getting Started API Reference Examples

Our Team

About Blog Changelog Conference

Legal

Support

Contact Cooperation