YouTube Playlist for all papers Proceedings on ACM DL

Paper Awards

All short and long papers were considered together when determining the Best Paper award and Best Paper Honorable Mention. Poster/Demo awards are listed on the poster/demo program page.

  • Best Paper: The Effect of Spatial Reference on Visual Attention and Workload during Viewpoint Guidance in Augmented Reality. Daniela Markov-Vetter, Marin Luboschik, ABM Tariqul Islam, Peter Gauger, and Oliver Staadt
  • Best Paper Honorable Mention: Mission Impossible Spaces: Using Challenge-Based Distractors to Reduce Noticeability of Self-Overlapping Virtual Architecture. Claudiu Ciumedean, Cristian Patras, Mantas Cibulskis, Norbert Váradi, and Niels C. Nilsson

Friday, 30 October

8:30 am ET Opening Remarks Presented live via Zoom
8:45 am ET Keynote (Presented live via Zoom)
Impossible outside virtual reality
by Mar Gonzalez Franco (Microsoft)

For Virtual Reality and Augmented Reality to become the primary interaction device with digital content, beyond the form factor, we need to understand what types of things we can do in VR that would be impossible with other technologies. That is, what does spatial computing bring to the table. For once, the spatialization of our senses. We can enhance audio or proprioception in complete new ways. We can grab and touch objects with new controllers, like never before. Even in the empty space between our hands. But how fast do we adapt to the new sensory experiences? Can we change how we perceive our own bodies with Avatars? Avatars represent other humans but can also substitute our own bodies. And we perceive our world through our bodies. Hence avatars change our perception of our surroundings. In this Keynote we will explore the uniqueness of VR, from perception to avatars and how they can ultimately change our behavior and interactions with the digital content.

Mar Gonzalez FrancoDr. Mar Gonzalez-Franco is a Researcher in the EPIC (Extended Perception Interaction and Cognition) team at Microsoft Research. In her research, she advances Spatial Computing by building new devices and experiences. All while studying human behavior, perception, and neuroscience.

Mar holds a BSc in Computer Science (URL, Barcelona) and MSc in Biomedical Engineering (Universitat de Barcelona and Tsinghua University). She earned her Ph.D. in Immersive Virtual Reality and Clinical Psychology under the supervision of Prof. Mel Slater at the EVENT-Lab, affiliated as a visiting student at the Massachusetts Institute of Technology, MediaLab. She completed her postdoctoral studies at University College London.

Despite her shift into industry -- first at Airbus Applied Maths laboratories in the UK and now at Microsoft Research -- she is still deeply involved in the scientific community, where she often acts as an expert advisor to governments (US NSF, Canada NSERC, European Commission). As well as published, served as program committee, chair, associate editor, and reviewed in multiple venues (IEEE & ACM conferences and transactions, Frontiers, Nature Publishing group, and Science Robotics). She is also very keen on disseminating her views on how technology companies and industrial labs should run. In that role, she has been invited for commentaries on her work for Scientific American, Bloomberg, GEN summit, as well as recognized by different institutions as a technology leader to follow: Business Insider ES 2019 award, MAS Technology Award 2019. Since 2020 she is also the Ethics and Diversity Chair of IEEE VGTC.

9:45 am ET Break
10:00am – 11:40 am ET Session I : Interaction (Presentations and Q&A performed live on Zoom; Presentations will be available for early viewing on YouTube)
Chaired by Peter Willemsen
YouTube Playlist
Hand with Sensing Sphere: Body-Centered Spatial Interactions with a Hand-Worn Spherical Camera (Long Paper)
Riku Arakawa, Azumi Maekawa, Zendai Kashino, Masahiko Inami

We propose a novel body-centered interaction system making use of a spherical camera attached to a hand. Its broad and unique field of view enables an all-in-one approach to sensing multiple pieces of contextual information in hand-based spatial interactions: (i) hand location on the body surface, (ii) hand posture, (iii) hand keypoints in certain postures, and (iv) the near-hand environment. The proposed system makes use of a deep-learning approach to perform hand location and posture recognition. The proposed system is capable of achieving high hand location and posture recognition accuracy, 85.0 % and 88.9 % respectively, after collecting sufficient data and training. Our result and example demonstrations show the potential of utilizing 360° cameras for vision-based sensing in context-aware body-centered spatial interactions.

Extend, Push, Pull: Smartphone Mediated Interaction in Spatial Augmented Reality via Intuitive Mode Switching (Long Paper)
Jeremy Hartmann, Aakar Gupta, Daniel Vogel

We investigate how smartphones can be used to mediate the manipulation of smartphone-based content in spatial augmented reality (SAR). A major challenge here is in seamlessly transitioning a phone between its use as a smartphone to its use as a controller for SAR. Most users are familiar with hand extension as a way for using a remote control for SAR. We therefore propose to use hand extension as an intuitive mode switching mechanism for switching back and forth between the mobile interaction mode and the spatial interaction mode. Based on this intuitive mode switch, our technique enables the user to push smartphone content to an external SAR environment, interact with the external content, rotate-scale-translate it, and pull the content back into the smartphone, all the while ensuring no conflict between mobile interaction and spatial interaction. To ensure feasibility of hand extension as mode switch, we evaluate the classification of extended and retracted states of the smartphone based on the phone’s relative 3D position with respect to the user’s head while varying user postures, surface distances, and target locations. Our results show that a random forest classifier can classify the extended and retracted states with a 96% accuracy on average.

TanGo: Exploring Expressive Tangible Interactions on Head-Mounted Displays (Long Paper)
Ruei-Che Chang, Chi-Huan Chiang, Shuo-wen Hsu, Chih-Yun Yang, Da-Yuan Huang, Bing-Yu Chen

We present TanGo, an always-available input modality on VR headset, which can be complementary to current VR accessories. TanGO is an active mechanical structure symmetrically equipped on Head-Mounted Display, enabling 3-dimensional bimanual sliding input with each degree of freedom furnished a brake system driven by micro servo generating totally 6 passive resistive force profiles. TanGo is an all-in-one structure that possess rich input and output while keeping compactness with the trade-offs between size, weight and usability. Users can actively gesture like pushing, shearing, or squeezing with specific output provided while allowing hands to rest in stationary experiences. TanGo also renders users flexibility to switch seamlessly between virtual and real usage in Augmented Reality without additional efforts and instruments. We demonstrate three applications to show the interaction space of TanGo and then discuss its limitation and show future possibilities based on preliminary user feedback.

Eye Gaze-based Object Rotation for Head-mounted Displays (Long Paper)
Chang Liu, Jason Orlosky, Alexander Plopski

Hands-free manipulation of 3D objects has long been a challenge for augmented and virtual reality (AR/VR). While many methods use eye gaze to assist with hand-based manipulations, interfaces cannot yet provide completely gaze-based 6 degree-of-freedom (DoF) manipulations in an efficient manner. To address this problem, we implemented three methods to handle rotations of virtual objects using gaze, including RotBar: a method that maps line-of-sight eye gaze onto per-axis rotations, RotPlane: a method that makes use of orthogonal planes to achieve per-axis angular rotations, and RotBall: a method that combines a traditional arcball with an external ring to handle user-perspective roll manipulations. We validated the efficiency of each method by conducting a user study involving a series of orientation tasks along different axes with each method. Experimental results showed that users could accomplish single-axis orientation tasks with RotBar and RotPlane significantly faster and more accurate than RotBall. On the other hand for multi-axis orientation tasks, RotBall significantly outperformed RotBar and RotPlane in terms of speed and accuracy.

Evaluating Interaction Cue Purpose and Timing for Learning and Retaining Virtual Reality Training (Long Paper)
Xinyu Hu, Alec G Moore, James Coleman Eubanks, Afham Aiyaz, Ryan P. McMahan

Interaction cues inform users about potential actions to take. Tutorials, games, educational systems, and training applications often employ interaction cues to direct users to take specific actions at particular moments. Prior studies have investigated many aspects of interaction cues, such as the feedforward and perceived affordances that often accompany them. However, two less-researched aspects of interaction cues include the effects of their purpose (i.e., the type of task conveyed) and their timing (i.e., when they are presented). In this paper, we present a study that evaluates the effects of interaction cue purpose and timing on performance while learning and retaining tasks with a virtual reality (VR) training application. Our results indicate that participants retained manipulation tasks significantly better than travel or selection tasks, despite both being significantly easier to complete than the manipulation tasks. Our results also indicate that immediate interaction cues afforded significantly faster learning and better retention than delayed interaction cues.

11:40 am ET Break
12:00 pm – 1:00pm ET Poster Session 1 / Demos Presented via Discord
Schedule for posters and demos

Saturday, 31 October

8:30am – 9:50 am ET Session II : Situated Input (Presentations and Q&A performed live on Zoom; Presentations will be available for early viewing on YouTube)
Chaired by Courtney Hutton
YouTube Playlist
Exploring the Need and Design of Situated Video Analytics (Long Paper)
Fouad Alallah, Yumiko Sakamoto, Pourang Irani

Visual video analytics research, stemming from data captured by surveillance cameras, have mainly focused on traditional computing paradigms, despite emerging platforms including mobile devices. We investigate the potential for situated video analytics, which involves the inspection of video data in the actual environment where the video was captured. Our ultimate goal is to explore the means to visually explore video data effectively, in situated contexts. We first investigate the performance of visual analytic tasks in situated vs. non-situated settings. We find that participants largely benefit from environmental cues for many analytic tasks. We then pose the question of how best to represent situated video data. To answer this, in a design session we explore end-users’ views on how to capture such data. Through the process of sketching, participants leveraged being situated, and explored how being in-situ influenced the participants’ integration of their designs. Based on these two elements, our paper proposes the need to develop novel spatial analytic user interfaces to support situated video analysis.

BodySLAM: Opportunistic User Digitization in Multi-UserAR/VR Experiences (Long Paper)
Karan Ahuja, Mayank Goel, Chris Harrison

Today’s augmented and virtual reality (AR/VR) systems do not provide body, hand or mouth tracking without special worn sensors or external infrastructure. Simultaneously, AR/VR systems are increasingly being used in co-located, multi-user experiences, opening the possibility for opportunistic capture of other users. This is the core idea behind BodySLAM, which uses disparate camera views from users to digitize the body, hands and mouth of other people, and then relay that information back to the respective users. If a user is seen by two or more people, 3D pose can be estimated via stereo reconstruction. Our system also maps the arrangement of users in real world coordinates. Our approach requires no additional hardware or sensors beyond what is already found in commercial AR/VR devices, such as Microsoft HoloLens or Oculus Quest.

Augmented Unlocking Techniques for Smartphones Using Pre-Touch Information (Short Paper)
Matthew Lakier, Dimcho Karakashev, Yixin Wang, Ian Goldberg

Smartphones secure a significant amount of personal and private information, and are playing an increasingly important role in peo- ple’s lives. However, current techniques to manually authenticate to smartphones have failed in both not-so-surprising (shoulder surfing) and quite surprising (smudge attacks) ways. In this work, we propose a new technique called 3D Pattern. Our 3D Pattern technique takes advantage of pre-touch sensing, which could soon allow smartphones to sense a user’s finger position at some distance from the screen. We describe and implement the technique, and evaluate it in a small pilot study (n=6) by comparing it to PIN and pattern locks. Our results show that although our prototype takes longer to authenticate, it is completely immune to smudge attacks and promises to be more resistant to shoulder surfing.

Arc-Type and Tilt-Type: Pen-based Immersive Text Input for Room-Scale VR (Long Paper)
Bret Jackson, Logan B Caraco, Zahara M Spilka

As immersive, room-scale virtual and augmented reality become more utilized in productive workflows, the need for fast, equally-immersive text input grows. Traditional keyboard interaction in these room-scale environments is limiting because of its predominantly-seated usage and the necessity for visual indicators of the hands and keys potentially breaking immersion. Pen-based VR/AR interaction presents an alternative immersive text input modality with high throughput. In this paper, we present two novel interfaces designed for a pen-shaped stylus that do not require the positioning of the controller within a region of space, but rather detect input from rotation and on-board buttons. Initial results show that compared with Controller Pointing text entry techniques, Tilt-Type was slower but produced fewer errors and was less physically demanding. Additional studies are needed to validate the results.

9:50 am ET Break
10:05am – 11:25 am ET Session III: Multimodality (Presentations and Q&A performed live on Zoom; Presentations will be available for early viewing on YouTube)
Chaired by Jerald Thomas
YouTube Playlist
RayGraphy: Aerial Volumetric Graphics Rendered Using Lasers in Fog (Long Paper)
Wataru Yamada, Hiroyuki Manabe, Daizo Ikeda, Jun Rekimoto

We present RayGraphy display technology that renders volumetric graphics by superimposing the trajectories of lights in indoor space filled with fog. Since the traditional FogScreen approach requires the shaping of a thin layer of fog, it can only show two-dimensional images in a narrow range that is close to the fog-emitting nozzle. Although a method that renders volumetric graphics with plasma generated using high-power laser was also proposed, its operation in a public space is considered quite dangerous. The proposed system mainly comprises dozens of laser projectors circularly arranged in a fog-filled space, and renders volumetric graphics in a fog by superimposing weak laser beams from the projectors. Compared to the conventional methods, this system employing weak laser beams and the non-shaped innocuous fog is more scalable and safer. We aim to construct a new spatial augmented reality platform where computer-generated images can be drawn directly in the real world. We implement a prototype that consists of 32 laser projectors and a fog machine. Moreover, we evaluate and discuss the system performance and characteristics in experiments.

Exploring the Use of Olfactory Stimuli Towards Reducing Visually Induced Motion Sickness in Virtual Reality (Long Paper)
Nimesha Ranasinghe, Pravar Jain, David Tolley, Shienny Karwita Tailan, Ching Chiuan Yen, Ellen Yi-Luen Do

Visually Induced Motion Sickness (VIMS) plagues a significant number of individuals who utilize Virtual Reality (VR) systems. Although several solutions have been proposed that aim to reduce the onset of VIMS, a reliable approach for moderating it within VR experiences has not yet been established. Here, we set the initial stage to explore the use of controlled olfactory stimuli towards reducing symptoms associated with VIMS. In this experimental study, participants perceived different olfactory stimuli while experiencing a first-person-view rollercoaster simulation using a VR Head-Mounted Display (HMD). The onsets of VIMS symptoms were analyzed using both the Simulator Sickness Questionnaire (SSQ) and the Fast Motion Sickness Scale (FMS). Notable reductions in overall SSQ and FMS scores suggest that providing a peppermint aroma reduces the severity of VIMS symptoms experienced in VR. Additional anecdotal feedback and potential future studies on using controlled olfactory stimuli to minimize the occurrence of VIMS symptoms are also discussed.

Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features of a Video (Long Paper)
Kai Zhang, Lawrence H Kim, Yipeng Guo, Sean Follmer

Tactile effects can enhance user experience of multimedia content. However, generating appropriate tactile stimuli without any human intervention remains a challenge. While visual or audio information has been used to automatically generate tactile effects, utilizing cross-modal information may further improve the spatiotemporal synchronization and user experience of the tactile effects. In this paper, we present a pipeline for automatic generation of vibrotactile effects through the extraction of both the visual and audio features from a video. Two neural network models are used to extract the diegetic audio content, and localize a sounding object in the scene. These models are then used to determine the spatial distribution and the intensity of the tactile effects. To evaluate the performance of our method, we conducted a user study to compare the videos with tactile effects generated by our method to both the original videos without any tactile stimuli and videos with tactile effects generated based on visual features only. The study results demonstrate that our cross-modal method creates tactile effects with better spatiotemporal synchronization than the existing visual-based method and provides a more immersive user experience.

Visuo-Motor Influence of Attached Robotic Neck Augmentation (Long Paper)
Lichao Shen, MHD Yamen Saraiji, Kai Kunze, Kouta Minamizawa, Roshan L Peiris

The combination of eye and head movements plays a major part in our visual process. The neck provides mobility for the head motion and also limits the range of visual motion in space. In this paper, a robotic neck augmentation system is designed to surmount the physical limitations of the neck. It applies in essential a visuomotor modification to the vision-neck relationship. We conducted two experiments to measure and verify the performance of the neck alternation. The multiple test results indicate the system has a positive effect to augment the motions of vision. Specifically, the robotic neck can enlarge the range of neck motion to 200%, and influence the response motion, by overall 22% less in time and 28% faster in speed.

Sunday, 1 November

IMPORTANT: Daylight Saving Time ends in the United States early in the morning Nov 1. If you are outside of the US, the offset between Eastern Time and your timezone may differ than what it was on Friday and Saturday.
8:30 am – 9:30 ET Poster Session 2 / Demos Presented via Discord
Schedule for posters and demos
9:30 am ET Break
9:45am – 11:25 am ET Session IV: Perception & Locomotion (Presentations and Q&A performed live on Zoom; Presentations will be available for early viewing on YouTube)
Chaired by Bret Jackson
YouTube Playlist
Mission Impossible Spaces: Using Challenge-Based Distractors to Reduce Noticeability of Self-Overlapping Virtual Architecture (Short Paper)
Claudiu-Bogdan Ciumedean, Cristian Patras, Mantas Cibulskis, Norbert Varadi, Niels Christian Nilsson

Impossible spaces make it possible to maximize the area of virtual environments that can be explored on foot through self-overlapping virtual architecture. This paper details a study exploring how users’ ability to detect overlapping virtual architecture is affected when the virtual environment includes distractors that impose additional cognitive load by challenging the users. The results indicate that such distractors both increase self-reported task load and reduce users’ ability to reliably detect overlaps between adjacent virtual rooms. That is, rooms could overlap by up to 68% when distractors were presented, compared to 40% when no distractors were present.

Methods for Evaluating Depth Perception in a Large-Screen Immersive Display (Short Paper)
Dylan Gaines, Scott Kuhl

We perform an experiment on distance perception in a large-screen display immersive virtual environment. Large-screen displays typically make direct blind walking tasks impossible, despite them being a popular distance response measure in the real world and in head-mounted displays. We use a movable large-screen display to compare direct blind walking and indirect triangulated pointing with monoscopic viewing. We find that participants judged distances to be 89.4% ± 28.7% and 108.5% ± 44.9% of their actual distances in the direct blind walking and triangulated pointing conditions, respectively. However, we find no statistically significant difference between these approaches. This work adds to the limited number of research studies on egocentric distance judgments with a large display wall for distances of 3-5 meters. It is the first, to our knowledge, to perform direct blind walking with a large display.

Rotational self-motion cues improve spatial learning when teleporting in virtual environments (Long Paper)
Alex Lim, Jonathan Kelly, Nathan Sepich, Lucia Cherep, Grace Freed, Stephen B. Gilbert

Teleporting interfaces are widely used in virtual reality applications to explore large virtual environments. When teleporting, the user indicates the intended location in the virtual environment and is instantly transported, typically without self-motion cues. This project explored the cost of teleporting on the acquisition of survey knowledge (i.e., a "cognitive map"). Two teleporting interfaces were compared, one with and one without visual and body-based rotational self-motion cues. Both interfaces lacked translational self-motion cues. Participants used one of the two teleporting interfaces to find and study the locations of six objects scattered throughout a large virtual environment. After learning, participants completed two measures of cognitive map fidelity: an object-to-object pointing task and a map drawing task. The results indicate superior spatial learning when rotational self-motion cues were available. Therefore, virtual reality developers should strongly consider the benefits of rotational self-motion cues when creating and choosing locomotion interfaces.

Exploring the Limitations of Environment Lighting on Optical See-Through Head-Mounted Displays (Long Paper)
Austin Erickson, Kangsoo Kim, Gerd Bruder, Greg Welch

Due to the additive light model employed by most optical see- through head-mounted displays (OST-HMDs), they provide the best augmented reality (AR) views in dark environments, where the added AR light does not have to compete against existing real-world lighting. AR imagery displayed on such devices loses a significant amount of contrast in well-lit environments such as outdoors in direct sunlight. To compensate for this, OST-HMDs often use a tinted visor to reduce the amount of environment light that reaches the user’s eyes, which in turn results in a loss of contrast in the user’s physical environment. While these effects are well known and grounded in existing literature, formal measurements of the illuminance and contrast of modern OST-HMDs are currently missing. In this paper, we provide illuminance measurements for both the Microsoft HoloLens 1 and its successor the HoloLens 2 under varying environment lighting conditions ranging from 0 to 20,000 lux. We evaluate how environment lighting impacts the user by calculating contrast ratios between rendered black (transparent) and white imagery displayed under these conditions, and evaluate how the intensity of environment lighting is impacted by donning and using the HMD. Our results indicate the further need for refinement in the design of future OST-HMDs to optimize contrast in environments with illuminance values greater than or equal to those found in indoor working environments.

The Effect of Spatial Reference on Visual Attention and Workload during Viewpoint Guidance in Augmented Reality (Long Paper)
Daniela Markov-Vetter, Martin Luboschik, ABM Tariqul Islam, Peter Gauger, Oliver Staadt

Considering human capability for spatial orientation and navigation, the visualization used to support the localization of off-screen targets inevitably influences the visual-spatial processing that relies on two frameworks. So far it is not proven which frame of reference, egocentric or exocentric, contributes most to efficient viewpoint guidance in a head-mounted Augmented Reality environment. This could be justified by the lack of objectively assessing the allocation of attention and mental workload demanded by the guidance method. This paper presents a user study investigating the effect of egocentric and exocentric viewpoint guidance on visual attention and mental workload. In parallel to a localization task, participants had to complete a divided attention task using the oddball paradigm. During task fulfilment, the heart rate variability was measured to determine the physiological stress level. The objective assessment of mental workload was supplemented by subjective ratings using the NASA TLX. The results show that egocentric viewpoint guidance leads to most efficient target cueing in terms of faster localization, higher accuracy and slower self-reported workload. In addition, egocentric target cueing causes a slight decrease in physiological stress and enables faster recognition of simultaneous events, although visual attention seemed to be covertly oriented.

11:25 am ET Break
11:40 am – 12:00 ET Awards Presented via Zoom
Current time (Eastern):

General Chairs

Peter Willemsen

University of Minnesota Duluth, USA

Scott Kuhl

Michigan Technological University, USA