Follow us for regular updates on Australian and NZ innovation
Follow us for regular updates on Australian and NZ innovation
Follow us for regular updates on Australian and NZ innovation
Eye-tracking—the ability to quickly and precisely measure the direction a user is looking while inside of a VR headset—is often talked about within the context of foveated rendering, with the hopes that it could reduce the performance requirements of VR. And while foveated rendering is an exciting use-case for eye-tracking in VR headsets, eye-tracking stands to bring so much more to the table.
Eye-tracking has been talked about with regards to VR as a distant technology for many years, but developments from companies across the industry have showed promising progress in precision, latency, robustness, and cost. The hardware is becoming increasingly available to developers and researchers.
Companies like Tobii are offering eye-tracking hardware and software to manufacturers and developers; Qualcomm is now offering Tobii’s solution in their VRDK headset. 7invensun is selling the aGlass eye-tracking development kit for Vive headsets. Fove is selling a development kit of their VR headset with inbuilt eye-tracking. Oculus recently showed off a new prototype seen for the first time with eye-tracking. Magic Leap has confirmed eye-tracking on their upcoming development headset. And even Apple is in the game, having reportedly acquired SMI, one of the former leaders in the eye-tracking space, and has drawn up patents implementing the technology.
With this momentum, in just a few years we could see eye-tracking become a standard part of consumer VR headsets. When that happens, there’s a wide range of features that the tech can enable that stand to drastically improve the VR experience.
Let’s first start with the one that many people are already familiar with. Foveated rendering aims to reduce the computational power required for displaying demanding VR scenes. The name comes from the ‘fovea’ a small pit at the center of the human retina which is densely packed with photoreceptors. It’s the fovea which gives us high resolution vision at the center of our field of view; meanwhile our peripheral vision is actually very poor at picking up detail and color, and is better tuned for spotting motion than seeing detail. You can think of it like a camera which has a large sensor with just a few megapixels, and another smaller sensor in the middle with lots of megapixels.
The region of your vision in which you can see in high detail is actually much smaller than most think—just a few degrees across the center of your view. The difference in resolving power between the fovea and the rest of the retina is so drastic that without your fovea you couldn’t make out the text on this page. You can see this easily for yourself: if you keep your eyes focused on this word and try to read just two sentences below, you’ll find it’s almost impossible to make out what the words say, even though you can see something resembling words. The reason that people overestimate the foveal region of their vision seems to be because the brain does a lot of unconscious interpretation and prediction to build a model of how we believe the world to be.
Foveated rendering aims to exploit this quirk of our vision by rendering the virtual scene in high resolution only in the region that the fovea sees, and then drastically cut down the complexity of the scene in our peripheral vision where the detail can’t be resolved anyway. Doing so allows us to focus most of the processing power where it contributes most to detail, while saving processing resources elsewhere. That may not sound like a huge deal, but as the display resolution of VR headsets and field of view increases, the power needed to render complex scenes grows at a near exponential rate.
Eye-tracking of course comes into play because we need to know where the center of the user’s gaze is at all times quickly and with high precision in order to pull off foveated rendering. It’s believe that this illusion could be done in a way that’s completely invisible to the user; anecdotally, I’ve seen recent demos where this was the case.
In addition to detecting movement, eye-tracking can also be used as a biometric identifier. That makes eye-tracking a great candidate for multiple user profiles across a single headset—when I put on the headset, the system can instantly identify me as a unique user and call up my customized environment, content library, game progress, and settings. When a friend puts on the headset, the system can load their preferences and saved data.
Eye-tracking can also be used to precisely measure IPD, the distance between one’s eyes. Knowing your IPD is important in VR because it’s required to move the lenses and displays into the optimal position for both comfort and visual quality. Unfortunately a lot of people don’t know what their IPD is (you can get a rough measurement if you ask someone to hold a ruler up to your eyes, or ask your eye doctor).
With eye-tracking it would be easy to instantly measure each user’s IPD and then have the headset’s software assist the user in adjusting headset’s IPD match, or warn users that their IPD is outside the range supported by the headset.
In more advanced headsets, this process could be invisible and automatic—the IPD could be measured invisibly, and the headset could have a motorized IPD adjustment which would automatically move the lenses into the correct position without the user needing to be aware of any of it.
A prototype varifocal headset | Image courtesy NVIDIA
The optical systems used in today’s VR headsets work pretty well but they’re actually rather simple and don’t support an important function of human vision: dynamic focus. This is because the display in a VR headset is always the same distance from our eyes, even when the stereoscopic depth suggests otherwise. This leads to an issue called vergence-accommodation conflict. If you want to learn a bit more in depth, check out our primer below:
Varifocal displays—those which can dynamically alter their focal depth—are proposed as a solution to this problem. There’s a number of approaches to varifocal displays, perhaps the most simple of which is an optical system where the display is physically moved back and forth from the lens in order to change focal depth on the fly.
Achieving such an actuated varifocal display requires eye-tracking because the system needs to know precisely where in the scene the user is looking. By tracing a path into the virtual scene from each of the user’s eyes, the system can find the point that those paths intersect, establishing the proper focal plane that the user is looking at. This information is then sent to the display to adjust accordingly, setting the focal depth to match the virtual distance from the user’s eye to the object.
A well implemented varifocal display could not only eliminate the vergence-accommodation conflict, but also allow users to focus on virtual objects much nearer to them than in existing headsets.
And well before we’re putting varifocal displays into VR headsets, eye-tracking could be used for simulated depth-of-field, which could approximate the blurring of objects outside of the focal plane of the user’s eyes.
While foveated rendering aims to better distribute rendering power between the part of our vision where we can see sharply and our low-detail peripheral vision, something similar can be achieved for the actual pixel count.
Rather than just changing the detail of the rendering on certain parts of the display vs. others, foveated displays are those which are physically moved to stay in front of the user’s gaze no matter where they look.
Foveated displays open the door to achieving much higher resolution in VR headsets without brute-forcing the problem by trying to cram pixels at higher resolution across our entire field of view. Doing so would not only be costly, but also bump into challenging power constraints as the number of pixels approach retinal-resolution. Instead, foveated displays would move a smaller, pixel-dense display to wherever the user is looking based on eye-tracking data. This approach could even lead to higher fields of view than could otherwise be achieved with a single flat display.
A rough approximation of how a pixel-dense foveated display looks against a larger, much less pixel-dense display in Varjo’s prototype headset. | Photo by Road to VR, based on images courtesy Varjo
Varjo is one company working on a foveated display system. They use a typical display that covers a wide field of view (but isn’t very pixel dense), and then superimpose a microdisplay that’s much more pixel dense on top of it. The combination of the two means the user gets both a wide field of view for their peripheral vision, and a region of very high resolution for their foveal vision.
Varjo’s latest prototypes aren’t currently moving the smaller display (it just hangs out at the center of the lens) but the company has considered a number of methods for moving the display to ensure the high resolution area is always at the center of your gaze.
Many social VR applications today appear to show users with realistic eye movements, including blinking, saccades, and object focus, but all of it is faked using animations and programmed logic. This illusion is good for making avatars appear less robotic, but of course the actual nonverbal information that would be conveyed when truly face-to-face with someone is lost.
Accurate eye-tracking data can readily be applied to VR avatars to actually show when a user is blinking and where they’re looking. It can also unlock both conscious and unconscious nonverbal communication like winking, squinting, and pupil dilation, and could even be used to infer some emotions like sadness or surprise, which could be reflected on an avatar’s face.
A heat map shows the parts of the scene viewed most often by users. | Image courtesy SMI
Eye-tracking can also be very useful for passively understanding player intent and focus. Consider a developer who is making a horror game where a player wanders through a haunted house. Traditionally the developer might spend a long time crafting a scripted sequence where a monster pops out of a closet as the player enters a certain area, but if the player isn’t looking directly at the closet then they might miss the scare. Eye-tracking input could be used to trigger the event only at the precise moment that the user is looking in the right direction for the maximum scare. Or it could be used to make a shadowy figure pass perfectly by the player but only in their peripheral vision, and make the figure disappear when the user attempts to look directly at it.
Beyond just using eye-tracking to maximize scares, such passive input can be used to help players achieve greater control over their virtual environment. Tobii, a maker of eye-tracking hardware and software, has a demo which helps users improve their aim when throwing objects in VR. By inferring where the user intends to throw an object based on their gaze, the system alters the trajectory of the thrown object to a perfectly accurate throw. While the clip below shows the actual vs. the corrected trajectory for demonstration purposes, in actual usage this is completely invisible to the user, and feels very natural.
Beyond this sort of real-time intent understanding, eye-tracking can also be very useful for analytics. By collecting data about what users are looking at and when, developers can achieve a much deeper understanding of how their applications are being used. For example, eye-tracking data could indicate whether or not users are discovering an important button or visual queue, if their attention is being caught by some unintended part of the environment, if an interface element is going unused, and much more.
Active Input Image courtesy Tobii
Eye-tracking can also be useful for active input, allowing users to consciously take advantage of their gaze to make tasks faster and easier. While many VR applications today allow users to ‘force pull’ objects at a distance by pointing at them and initiating a grab, eye-tracking could make that quicker and more accurate, allowing users to simply look and grab. Using eye-tracking for this task can actually be much more accurate, because our eyes are much better at pointing at distant objects than using a laser pointer from our hands, since the natural shakiness of our hands is amplified over distance.
Similar to grabbing objects, eye-tracking input is likely to be helpful for making VR fast and productive, allowing users to press buttons and do other actions much more quickly than if they had to move their body or hands to achieve the same. You can bet that when it comes to VR as a truly productive general computing platform, eye-tracking input will play a major role.
Healthcare & Research
Image courtesy Tobii
And then there’s a broad range of use-cases for eye-tracking in healthcare and research. Companies like SyncThink are using headsets equipped with eye-tracking to detect concussions, purportedly increasing the efficacy of on-field diagnosis.
Researchers too can use eye-tracking for data collection and input, like getting a look at what role gaze plays in the performance of a professional pianist, better understanding autism’s influence on social eye contact, or bringing accessibility to more people.
Given the range of potential improvements, it’s clear why eye-tracking will be a game changer for VR. In the near future, built-in eye-tracking is likely to become a feature of premium headsets first, before eventually becoming the norm for VR (and eventually AR too).