The Journey of a Touch - Part II
Having explored the way an SPI-based digitizer would be integrated into a iOS, we can now explore the mechanisms through which the operating system would transfer the touch event to the relevant application running on an Apple Device.
Surfacing the touch event, from Kernel Space to User Space
For security reasons, direct access to hardware, ranging from memory to peripherals, is only allowed in the kernel space. To access hardware, a process has to either be running in the kernel space (as it’s the case with the drivers as kernel extensions) or have a kernel delegate, which does have access to hardware ( such as driver extensions in DriverKit).
Since IOKit drivers and their WorkLoops all execute in the kernel, in various contexts and with various restrictions, the events they work with are still within the kernel’s boundary. For those events to be visible to end-user facing applications, they need to be surfaced to the user space. In other words, they need to cross the kernel boundaries. Surfacing these events is a complex, carefully coordinated process.
After the IOKit driver consumes the processed touch information from the Ping-Pong buffers, using the structures described in the chapter Understanding Apple’s drivers ecosystem, it persists the message in a dedicated shared memory ring buffer, implemented as an IOSharedDataQueue. When data is added to this buffer, registered clients (particularly the backboardddaemon) receive a Mach notification, which in turn wakes the daemon’s event dispatcher thread. Backboardd then dequeuesevents from the shared memory buffer and further processes them.
The mach notification model, with gated access to shared memory, ensures the lowest possible latency for high-throughput scenarios, as it’s the case with device-generated events. It works well because it’s carefully synchronized by Apple, using low-level synchronization mechanisms, such as IOCommandGates. And yet, as seen in CVE-2017-7162, even with Apple’s careful design, mistakes can happen.
In higher level programming, communicating through shared memory is often discouraged. Even though it’s less common, it is still useful to understand this model, especially when optimizing for performance.
Next, backboardd forwards the event to multiple subsystems. First, it assumes System Gestures (such as minimizing an application, lowering the status bar etc.) are possible, so it sends the event to the SpringBoard process, which manages the Home Screen and System UI (also via Mach Ports). If it needs to, SpringBoard assumes control and handles the system events.
Secondly, since it keeps a record of all application frames, together with the state of these applications (whether they are running, whether they are in the foreground or background etc.), backboardd also locates the application it should send the event to, by determining if the touch coordinates fall within the frame of an active application in the foreground. Once the hit tests find the frame that contains the touch point, backboardd identifies the process that owns the frame and forwards the touch event information to its associated listening Mach Port.
Handling the Touch Events
When any application starts up, it is encapsulated within a process running on the operating system. This process spans an initial thread which, on Apple systems, is named Thread 1, and it’s known as the Main Thread. It executes the instructions found in the executable top level code (the entry point of the application, or the main function, in most languages) and, since a process ends when its main thread completes, the main thread usually runs in a loop. More accurately, it starts a CFRunLoop, executes its setup instructions, then blocks (sleeps), waiting for various events. Unlike the example in “From a CLI task to a run (main) loop”, though, SwiftUI (or most other UI frameworks designed for efficiency) does not trigger timer events (or any events) unless it needs to. Instead, when an application starts up, it executes an initial setup process, to load the data it requires to display the first scene (a lot more on this in the SwiftUI sections) and then, if the application is well written, it blocks quickly, waiting for new events.
As shown in the previous section, when backboardd identifies the application that should receive the event, it sends the touch information, serialized into a mach message, to its dedicated mach port. This, in turn, marks the application thread listening on the port as runnable and, in a mechanism similar to the one explored previously, the application’s event handling thread drains the port. Each event is dequeued in sequence and it is then processed by the UI Framework.
First, it performs hit tests, to determine which UI Responder should receive the touch event ( similar to how backboarddidentified the application to route the event towards). Then, the touch information (packaged as a UITouch Event in UIKit or a Gesture in SwiftUI) triggers the execution of a few functions. The exact implementation is not particularly relevant at this point, but the general idea can serve as inspiration for your own implementations.
Of the numerous events a control can react to, there are two types of touch events that are particularly relevant for a buttoncontrol. First, the touchDown event, which indicates that the control has been touched, triggers the execution of an activation animation (usually, this highlights the button). Additionally, the touchUpInside or touchUpOutside events, which indicate that the button has been released, trigger the execution of a deactivation animation and, at the same time, signal the framework to execute the instructions found in the closure of the SwiftUI Button control view.
You can easily check this separation (and how SwiftUI handles these events) by touching the “Please press the button” button in the example application at the start of this chapter and holding it pressed. You should notice how the button lights up (activates), but you would also notice how the message underneath the button remains “The button has been pressed 0 times ”. This indicates that the Button’s action closure did not execute yet. Once you release the button, the control’s deactivation animation plays and the text updates to reflect the change.
In our example, since the end-user just touched then released the button, the first event sent by backboardd is translated to a touchDown event. Since the touch occurred within the frame of a button, the UI framework runs the code associated with the control’s activation animation. Typically, this animation extends over several frames, depending on the way the animation is configured. To prepare the first frame in the animation, the Application (more specifically the code that handles the UI, such as SwiftUI or UIKit) updates the characteristics of the Button control. For example, it changes a the color to a different, lighter shade, to act as a highlight, while also scaling the shape down and perhaps adding some changes to the button’s shadows and outlines. After all changes required for the first frame of the animation are processed, they result in an update to the underlying CALayer construct (within the CoreAnimation framework). Whenever a view needs to change its aspect, the underlying CoreAnimation CALayer structure triggers a setNeedsLayout layout call. The function invalidates the view (more on this in the SwiftUI sections), which results in an update in the Application’s CALayer Tree. The diagram below outlines this process.
The next event sent by backboardd is translated to a touchUpInside event, which marks the beginning of the button deactivation animation, as well as the execution of the action
closure’s instructions (in this case, counter += 1
).
Reacting to UI Update Events
When the visual content of an application needs to be updated, the UI framework schedules rendering updates as part of a mechanism commonly referred to as the Render Loop. This process is typically synchronized with the device’s display (or, if more than one display is connected, the fastest display) using a CADisplayLink object.
Internally, the CADisplayLink is tied to the system’s V-SYNC (Vertical Synchronization) events, which occur during the display’s vertical blanking interval ( VBLANK). VBLANK is the brief moment where the screen is refreshed and not actively drawing. On a standard 60Hz display, this interval occurs roughly translates to 16.67 milliseconds (the system needs to render 60 frames per second, or one frame every 16 milliseconds). For a 120Hz Pro Motion display, the V-Sync interval is even shorter, around 8.33 milliseconds.
To prevent screen tearing and other potential issues, the application is required to receive and process the event, then update the underlying CALayer Tree
(Apple calls this the event phase), all within a V-Sync Interval. Once this synchronous, blocking process is complete, the application moves to the commit phase, where it sends the new CALayer Tree information to another process, known as the Render Server. Since it’s an inter-process communication flow, it’s also implemented through Mach Ports.
On iOS, the Render Server functionality is implemented in the backboardd process, which is why it’s marked as the Render Server in the Instruments application.
On every V-Sync interval, the backboardd process needs to complete its own tasks, split in two phases: the rendering preparation phase, followed by the render execution phase. In the rendering preparation phase, which runs on the CPU portion of the SoC, backboardd collects the information submitted by all applications in the foreground, then composes a final image data structure, in the form of GPU Render instructions (for the Metal framework). Then, the SoC portion of the render loop forwards the frame information to the GPU, which in turn draws the final image and saves it into a frame buffer.
By default, the render loop uses two frame buffers, in a setup known as double buffering. One buffer contains the image that is visible on the display (the front buffer), while the other contains the image that is being rendered (the back buffer). On every screen refresh, the two buffers are swapped. Whatever image is presented in the front buffer at the time of the Display’s VBLANK would be displayed on the screen. This is why the GPU never renders directly in the front buffer ( if the image is not complete, the display would simply display the incomplete image).
As a safety mechanism, if the image in the back buffer is not completely drawn, the system falls back to a triple buffering mechanism, where it uses two back buffers, instead of the one. It loads the image currently present in the front buffer into the a second back buffer (which is usually called the spare buffer), while also continuing to render the image in the initial back buffer. When the V-Sync event is triggered, the system swaps the front buffer with the spare buffer.
At the next V-Sync event, the front buffer should be swapped with the back buffer, and the spare buffer is usually destroyed. Any deviation from this process would cause a hitch, or an issue with the animations. Apple describes this process in great detail in their Tech Talks such as Explore UI animation hitches and the render loop. The linked talk contains detailed explanations and troubleshooting details that would become valuable to you later, so I am purposefully not diving into more details here. Instead, I strongly recommend you would watch that talk. Together with the information presented in this section, you should have a detailed understanding of how events are captured, processed and rendered on your device.