Starting in iOS 14 and macOS Big Sur, developers will be able to add the capability to detect human body and hand poses in photos and videos to their apps using Apple's updated Vision framework, as explained in this WWDC 2020 session.
This functionality will allow apps to analyze the poses, movements, and gestures of people, enabling a wide variety of potential features. Apple provides some examples, including a fitness app that could automatically track the exercise a user performs, a safety-training app that could help employees use correct ergonomics, and a media-editing app that could find photos or videos based on pose similarity.
Hand pose detection in particular promises to deliver a new form of interaction with apps. Apple's demonstration showed a person holding their thumb and index finger together and then being able to draw in an iPhone app without touching the display.
Additionally, apps could use the framework to overlay emoji or graphics on a user's hands that mirror the specific gesture, such as a peace sign.
Another example is a camera app that automatically triggers photo capture when it detects the user making a specific hand gesture in the air.
The framework is capable of detecting multiple hands or bodies in one scene, but the algorithms might not work as well with people who are wearing gloves, bent over, facing upside down, or wearing overflowing or robe-like clothing. The algorithm can also experience difficulties if a person is close to edge of the screen or partially obstructed.
Similar functionality is already available through ARKit, but it is limited to augmented reality sessions and only works with the rear-facing camera on compatible iPhone and iPad models. With the updated Vision framework, developers have many more possibilities.