Galacean Logo
English
English
InputManager Design and Implementation

InputManager Design and Implementation

technical

Introduction

Interactive input is a very important function in the engine's functional layer. It allows users to interact with the application using devices, touch, or gestures. In the 0.6 milestone, we initially built the interaction system of the Galacean Engine, which currently supports click and keyboard. This article will share the thoughts and shortcomings during the development process.

Overall Design

Main Architecture

Input devices, touch, XR devices, etc., all belong to the input of the interaction system. We gather all the logic of the input in the InputManager, and subdivide it into specific inputs such as PointerManager and KeyBoardManager according to various types of input. The InputManager manages all specific input managers. In the frame processing of interaction, it only needs to process the logic of specific inputs within each manager.

API Design

image.png

Frame Lifecycle

The lifecycle of frame processing is as follows:

image.png

The internal lifecycle of InputManager is as follows:

image.png

How to Use

Pointer

  1. Add a collider to the object with a collision volume in the three-dimensional space.
  2. Refer to the trigger conditions of the callback interface in the Script component to add appropriate logic. | Interface | Trigger Timing and Frequency | | --- | --- | | onPointerEnter | Triggered once when the touch point enters the collision volume of the Entity | | onPointerExit | Triggered once when the touch point leaves the collision volume of the Entity | | onPointerDown | Triggered once when the touch point is pressed within the collision volume of the Entity | | onPointerUp | Triggered once when the touch point is released within the collision volume of the Entity | | onPointerClick | Triggered once when the touch point is pressed and released within the collision volume of the Entity | | onPointerDrag | Continuously triggered when the touch point is pressed within the collision volume of the Entity until the touch point is no longer pressed |

KeyBoard

Directly call the methods provided by the InputManager to determine the key status.

Method NameMethod Explanation
isKeyHeldDownReturns whether the key is being held down continuously
isKeyDownReturns whether the key was pressed in the current frame
isKeyUpReturns whether the key was released in the current frame

Mouse and Touch

Background

PointerEvent is the trend of subsequent development of mouse and touch interaction in browsers. Pointer is a hardware layer abstraction of input devices. Developers do not need to care whether the data source is a mouse, touchpad, or touchscreen. However, it also has some compatibility issues. As can be seen in canIUse, the device coverage rate of PointerEvent is 92.82%, which needs to be solved by importing Polyfill.

image.png

Requirement Research

Add hook functions that respond to Pointer in the script component. For entities with collision volume in three-dimensional space, developers can easily implement interactions such as click, drag, and select by supplementing the logic in the corresponding hook functions.

Hook FunctionTrigger Timing and Frequency
onPointerEnterTriggered once when the touch point enters the Entity's collider range
onPointerExitTriggered once when the touch point leaves the Entity's collider range
onPointerDownTriggered once when the touch point is pressed within the Entity's collider range
onPointerUpTriggered once when the touch point is released within the Entity's collider range
onPointerClickTriggered once when the touch point is pressed and released within the Entity's collider range
onPointerDragContinuously triggered when the touch point is pressed within the Entity's collider range until the touch point is no longer pressed

Native Events

Similar to MouseEvent and TouchEvent, PointerEvent can also be captured by listening. canvas.addEventListener('pointerXXX', callBack);

MouseEventTouchEventPointerEvent
Pressmousedowntouchstartpointerdown
Releasemouseuptouchendpointerup
Movemousemovetouchmovepointermove
Leavemouseout | mouseleavetouchend | touchcancelpointerout | pointercancel | pointerleave

Flowchart

The general process of Pointer handling can be summarized, where the green boxes represent native events.

未命名文件 (1).png

Raycasting

The biggest problem to solve in Pointer is how to perform raycasting in three-dimensional space based on the position information from native events. This part not only includes basic knowledge of spatial transformation but also the basic use of the physics system.

After capturing the PointerEvent, we need to:

  1. Obtain valid screen position information from the native event.
  2. Convert the position from screen space to three-dimensional space and get the detection ray.
  3. Perform intersection detection between the ray and the collider.
  4. Callback the script.

Screen Position Information

We aim to get the position of the pointer relative to the target element, but there are many coordinate properties in native events, so we need to identify which coordinate information is valid.

Native Event Coordinate PropertiesProperty Explanation
clientX & clientYCoordinates relative to the application area that triggered the event (viewport coordinates)
offsetX & offsetYCoordinates relative to the target element
pageX & pageYCoordinates relative to the entire Document (including scroll areas)
screenX & screenYCoordinates relative to the top-left corner of the main display (rarely used)
x & ySame as clientX & clientY

They have the following conversion relationships (assuming the native event is event and the clicked target element is canvas):

image.png

The conclusion is: most coordinate properties can provide the desired coordinate information, with offset being the most direct and convenient.

Spatial Transformation

Simplify raycasting by obtaining a ray in three-dimensional space from the coordinates clicked on the screen and then performing collision detection with colliders in three-dimensional space.

Taking a perspective camera as an example, after obtaining the coordinates clicked on the screen, the following steps are needed to get the ray:

  1. offset -> Screen Space
  2. Screen Space -> Clip Space
  3. Clip Space -> World Space

Those familiar with graphics engines know that during rendering, we undergo the following transformations:

  1. Model Space -> World Space
  2. World Space -> View Space -> Clip Space
  3. Clip Space -> Screen Space

It seems that we only need to get the screen space coordinates and then perform the inverse transformations of several space transformations.

offset -> Clip Space

It is necessary to have a general understanding of pixels (pixel), device-independent pixels (dips), and device pixel ratio (devicePixelRatio). The coordinate information obtained from the offset property in the click event is in device-independent pixels, so when calculating screen space coordinates, ensure that the units of the numerator and denominator are consistent.

裁剪空间是 XYZ 范围皆在 -1 到 1 的左手坐标系(裁剪空间可以形象地理解为当渲染范围超出这个区间就会被裁减),此处转换时需注意:

  1. 求解触摸点在屏幕空间的相对位置时要注意分子与父母应都为像素或都为设备独立像素。
  2. 裁剪空间 Y 轴方向向上,offset 参考坐标系 Y 轴方向向下,因此 Y 轴需翻转。
  3. 裁剪空间中 depth 离观察者越远值越大,简单来说近平面是 -1 远平面是 1 。
image.png

屏幕空间的点 -> 世界空间的射线

公示推导中矩阵为列为主序。

以透视相机为例,世界空间经过 View 变换和 Project 变换即可转换到裁剪空间,那么从裁剪空间转换到世界空间只需要经历这些变换的逆即可。

image.png
image.png

检测射线

上式中代入近平面深度与远平面深度依次求得触摸点在世界坐标空间下近平面与远平面的投影点,连接这两个点即可得到检测射线。

image.png

射线相交检测

碰撞体由规则几何体组成(长方体,球体等)可以查阅相关射线与几何体相交算法。

image.png

脚本回调

当物理引擎返回命中的碰撞体后,可以认为它的 Entity 这就是当前帧的所有onPointerXXX回调的当事人了,在这个环节只需要根据收集的原生事件进行脚本回调即可。

性能优化

  • **压流:**捕获 PointerEvent 后将原生事件压入数组,待执行到交互系统的 tick 时,再按序处理相应逻辑。
  • **Pointer 合并:**射线检测的性能损耗较大,所以在屏幕上有多个触控点时,我们会按照一定规则合并这几个触控点,因此在触控交互逻辑中每帧的射线检测至多只会执行一次。
  • **多相机场景:**当出现多相机时,会依次检查渲染范围包含了点击点的所有相机,并根据相机的渲染顺序进行排序(后渲染优先),如果当前比较的相机渲染场景内没有命中碰撞体且相机的背景透明,点击事件会继续传递至上一个渲染的相机,直至命中或遍历完所有相机。
image.png

注意事项

正如开篇提到的兼容性问题,如果你的项目可能运行在低系统版本的机器中,可以导入我们定制的 PointerPolyFill 。 https://github.com/galacean/polyfill-pointer-event

键盘输入

需求调研

  • 获取当前帧所有按下过的按键
  • 获取当前帧所有松开过的按键
  • 获取当前还按着的按键
  • 判断某个按键在当前帧是否按下过
  • 判断某个按键在当前帧是否松开过
  • 判断某个按键现在还按着

原生事件

KeyBoardEvent 可以通过监听捕获。 canvas.addEventListener('keyXXX', callBack);

事件触发时机
keypress字符键按下时触发
keydown任意键按下时触发
keyup任意键抬起时触发

Flowchart

The general process of keyboard handling can be summarized, where the green boxes represent native events.

任务系统.png

Selection of Index Value

Regardless of different case states or keyboard layouts, a key is an enumerable value. If the key value can be stored as an enumeration, it will bring great convenience in terms of both performance and usage. Therefore, it is necessary to determine the appropriate property to use as the enumeration value.

The following are properties in KeyEvent that can be used as enumeration values:

PropertyDescriptionSimple ExampleCompatibility
codeThe physical key that triggered the event, layout-independentRegardless of case or layout, when you press the Y key, the return is always the physical key "KeyY"Compatible
keyThe key value that triggered the event"y" when lowercase, "Y" when uppercaseCompatible
charCodeDeprecated
keyCodeDeprecated
charDeprecated

It can be found that the most suitable property is code. Refer to https://w3c.github.io/uievents-code/.

Performance Optimization

The interaction logic of each frame's key press is relatively simple. Maintaining three arrays for pressed, released, and held keys can meet all needs. The focus is on how to reduce the performance loss of frame-level add, delete, and search operations.

  • Optimize adding elements and resetting array length by using unordered arrays
  • Optimize search by adding an index

Unordered Arrays

Unordered arrays reduce performance loss when adding and deleting elements in most cases. The following diagram shows the composition of unordered arrays:

image.png

The following diagram shows how unordered arrays reduce performance loss:

image.png

Storage and Indexing

If only three unordered arrays are used, it still requires traversing the array to check the state of a specific key, which can cause significant performance loss in extreme cases. If the current key enumeration is used as the Key and whether it was pressed in the current frame is recorded as the Value, traversal can be avoided.

image.png

Although this implementation makes querying faster, it adds maintenance costs — the state of the mapping table needs to be reset at the beginning of each frame. However, if the frame number is saved, this cost can be perfectly avoided by updating the frame number at the beginning of each frame.

image.png

Similarly, when recording keys that are held down, an additional table is added to map the key to its index in the unordered array.

Quick Start

Key StateisKeyHeldDownisKeyDownisKeyUp
The key has been held down since the previous frametruefalsefalse
The key was pressed in the current frame and not releasedtruetruefalse
The key was released and pressed again in the current frametruetruetrue
The key was pressed and released in the current framefalsetruetrue
The key was released in the current framefalsefalsetrue
The key was not pressed and had no interactionfalsefalsefalse
This situation will not occurtruefalsetrue
This situation will not occurfalsetruefalse

Notes

  • When a key is held down for a period of time, the native keydown event will be continuously triggered. We have considered and filtered this situation, so developers do not need to do any extra processing.
  • The native event behavior of some state keys may be peculiar, and the behavior of triggering events may even be inconsistent between FireFox and Chrome (e.g., Caps Lock).

It looks like you haven't pasted the Markdown content yet. Please provide the content you want translated, and I'll help you with the translation while adhering to the rules you've specified.

Galacean Logo
Make fantastic web apps with the prospective
technologies and tools.
Copyright © 2025 Galacean
All rights reserved.