Accessible Realities

Making 3D Content More Accessible on the Web: “Semantic XR” Proof of Concept

Introduction

This blog post demonstrates an approach for making 3D scenes in interactive 3D environments, videos and images more accessible. Doing so by using a semantic description of these 3D scenes. For example: using a list of objects’ names and locations for generating spatial audio cues. A Proof of Concept of using a concrete semantic 3D scene description is presented, including demo videos and discussion about personalization and other opportunities.

This work could be extended to other contexts but in this post the focus is on accessibility for people who are blind or have low vision and on 3D content in the browser and integration with existing web standards.

Background

The idea of using a semantic scene description to support web accessibility is discussed in several places, including the following non comprehensive resources list: in the context of accessibility for WebXR, in the XRA Semantics Module document, in a summary of the W3C Workshop on Web Games, in a previous post on this blog and in an Immersive Web Proposal Issue which also references ARIA and AOM potential usage in this context.

This blog post and videos describe a Proof of Concept of this idea. It is a followup work to my presentation at the W3C Workshop on Inclusive Design for Immersive Web Standards (more details on the workshop provided below).

Demo Videos

Demo 1: Interactive 3D Environment

Improving accessibility of an interactive 3D environment with a screen reader and spatial audio cues.

The 3D environment is a small area with blocks. The user should reach a destination block while avoiding other blocks and obstacles.

Technically, the 3D rich internet application updates the web page with its semantic scene description. Then, JavaScript code which is part of the web page processes this semantic scene description and turns it into meaningful text for the user or into spatial audio.

Semantic Interactive 3D Environment

Spatial audio which is part of competitive games could be implemented with code that is part of the game itself. This code could still use the semantic data but use it internally and not make it available to any external entities.

The semantic scene description includes, in addition to what was demonstrated, also a high level description of the 3D scene. This description is made available as regular text on the web page

Demo 2: A Video of an Interactive 3D Environment

Improving accessibility of a video of an interactive 3D environment (the same environment from Demo 1 above). Doing it using spatial audio cues for the objects that appear in the video

Semantic Video of an Interactive 3D Environment

Demo 3: Real World Video and Photo

Visual augmentation using semantic scene description of a video and a photo of the real world. The demo shows how a user interacts with specific objects “inside” the video. The video shows two coffee mugs on a wide plate which is being rotated around its center

Semantic Real World Video and Photo

Personalization Opportunities

In many XR accessibility discussions a need for personalization arises (like during the W3C workshop mentioned below or in XR Access discussions or in the game accessibility guidelines). This was also my own personal experience with feedback on Accessible Realities.

Once 3D applications, XR experiences, videos and images provide access to their semantic data this could open up fine-tuning and personalization opportunities: the availability of the entire raw semantic data of the 3D scene could enable accessibility solutions to make the data accessible in many different ways (like: do the spatial audio scan from left-to-right or from closest-to-most-distant or only as high level description etc.)

This would enable people who are blind and are software developers or accessibility related companies and NGOs to develop solutions to make the raw semantic data accessible in a way that is the best for them or their users. Let accessibility solution providers use the data and do their own magic…

More Opportunities

Accessibility Solutions’ integration with different kinds of media types and immersive experiences as long as these support the semantic scene description format (Virtual Reality, Augmented Reality, Mixed Reality, 3D applications, videos, images etc.)
Accessibility for the Real World – in the future, AI-based methods might be fast, accurate and safe enough to generate the semantic data in real-time for the real world (in a similar manner to a previous video using an older version of the semantic data model)
Support different kinds of disabilities as part of the semantic format
Platform and language independent format – enabling exchange of semantic data between systems and a single authoritative defining standard
Extensibility – via versions or hooks for custom extensions like one for specific media types
Developer control and ease of use – authoring tools could include extensions to enable developers to control what specific parts of the semantic data should be generated. These tools would greatly assist developers in generating the semantic data in a valid form in real-time

Challenges

Privacy – enable users to control sharing of their own semantic data (like their name and location in a virtual world) with other users and entities
Security – similar considerations to those mentioned in the WebVTT documentation if using the Timed Text Track approach
Competitive Games – naturally, semantic data that would be shared externally in competitive environments would be extremely limited. Yet, there are XR applications which are not competitive like ones for education, training, social VR communities etc. Non interactive media like Images and videos could benefit from metadata semantic data format as well. Note that the developer controls what parts of the semantic data are made available to their users. In addition, even if the semantic data is not made available to external entities it could still be made available to 3rd party accessibility libraries which are part of the code base. These libraries could enable accessibility without providing a distinct competitive advantage
Performance – this potential issue might be mitigated using different levels of details of the semantic data: temporal resolution (frequency of updates), spatial resolution and semantic level of detail (for example: how many objects are being described? only high level objects? all objects? etc.)

Future Opportunities

Authoring Tools integration for Generation of Semantic Data

Authoring tools (such as: game engines, animation tools and 3D computer graphics software) could include plugins for generating the semantic data in real time. This would be added as metadata to 3D interactive and immersive XR content and to images and videos. Semantic data generation plugins could also support the rising trend of Virtual Production. Introducing semantic data into virtual production tools could be a powerful way for making videos and movies more accessible. AI-based methods to generate semantic data could also be used either in real-time or for existing media.

It could make a lot of sense to separate entirely and decouple the generation of the semantic data from its usage. Plugins for game engines and other authoring tools that focus only on generating the semantic data could be easier to develop, test and maintain. In turn, accessibility solutions developers such as research labs, NGOs, tool providers and better yet people with disabilities who are also software developers could write accessibility solutions that build upon the semantic data. This would enable integration with any content that supports the semantic data format.

Uses of Semantic Data (Web and Non-web)

Clients of the semantic data that could leverage it for accessibility could include: browsers and their extensions, screen readers and their extensions and open source software libraries (for example: JavaScript ones for web and C++, C# and others for game engines).

The semantic data could be made accessible in many different ways in addition to the ways demonstrated above.

One example could be a chat app that would answer questions like: “how many people are in front of me?”, “where is the closest couch?”, “please briefly describe the area to me” etc.

Other non-accessibility uses could include: creative semantic editing of images and videos and training Machine Learning models in 3D scenes (in virtual worlds and in the real world for robotics for example).

Technical Details

How It Was Done

The demos were implemented using a semantic data format (Semantic-XR). This format include properties such as: objects’ names and locations, collision prediction details and more. It was defined based on checking a variety of existing accessibility solutions as well as my experience with Accessible Realities
Semantic data was added as metadata to the videos using a Text Track format, specifically WebVTT (kind=“metadata”). Note that semantic data could be added retroactively to old videos in this way to make them more accessible
Screen reader support was implemented using JavaScript-based updates of text in an aria-live region (using “polite” updates). The text was recreated each time the Semantic-XR data got updated
Spatial audio augmentation was done using a JavaScript library with spatial audio support
Visual augmentation was done by leveraging a Semantic-XR metadata track that included depth map data. It was implemented using the Canvas API
Tools used: Unreal Engine 4.22 game engine, Visual Studio IDE, Howler.js audio library with spatial audio support, NVDA screen reader, Firefox web browser. The depth map data in both the real world video and the real world photo demos was created manually (with a lot of patience). This manual process could be replaced with AI-based methods
Semantic-XR in this context is defined as: a description of an XR experience as meaningful and well-defined data in order to make it more accessible for humans and machines. Note: the “XR” part of the name “Semantic-XR” originally had a more narrow meaning (its commonly used meaning of: Virtual Reality, Augmented Reality and the like). Currently, it is used in the broadest sense of the word “Reality” including 3D scenes in images and videos as the format was found to be useful for these as well
Accessible Realities Unreal Engine 4 library was used to dynamically generate the Semantic-XR data in the 3D interactive environment demos. Doing it using a new library feature of Semantic-XR export. In the future, if a semantic data format is standardized, a variety of tools could be developed or extended to generate the semantic data (for different authoring tools, using AI-based methods etc.)

Technical Suggestions For Consideration

The idea of a standard semantic scene description is a very powerful one for accessibility and other uses as described in many places linked above. Doing the Proof of Concept has strengthened my own belief that this could be a useful approach which is worth further evaluation
Integration of a semantic format with existing standards
- Consider adding kind=”semantic” to Timed Text Track
- Consider allowing Canvas (or any Rich Internet Applications’ container) to have Timed Text Track children. This could be a powerful way to deliver captions, subtitles, audio descriptions and metadata for Rich Internet Applications
- Consider adding standard semantic metadata as a standard metadata section for different media files (like: different image formats)
When defining the format, consider keeping the format semantic and not tied to for example only to 3D visual content formats. This would enable having semantic data which is not only visual, in a similar manner to what was requested in this WebXR issue. Non visual semantic content could include:
- Cognitive related data, for example up-to-date instructions for an interactive experience could be made available in-game on demand, like suggested in the Games Accessibility Guidelines
- Audio related data supporting people who are deaf or hard of hearing. For example: who said what to whom, when and where in the 3D scene. This data could then be used to populate captions and subtitles tracks including spatial visual cues showing where the sound came from

W3C Workshop On Inclusive Design for Immersive Web standards

This blog post is a followup work to a presentation I gave at the W3C Workshop on Inclusive Design for Immersive Web Standards which I was very fortunate to attend and present at. The workshop included a lot of highly interesting and useful presentations and the atmosphere felt very inclusive and welcoming. It was definitely one of the best workshops I have ever attended. Links to the presentations were made available online. Also available online is a great summary followup talk with many insights by Thomas Logan. I would like to thank the organizers of this W3C workshop and hope that this followup work would be in some way useful to advance the important goal of inclusive and accessible XR on the web and beyond.

Feedback

If there is interest, I would be happy to make the Semantic-XR format details available online.

Your feedback is very welcome, I’m always eager to improve (available here).

Accessible Realities: Making video games and XR accessible for people who are blind or have low vision

Summary

How can we make it easier for video game developers to add accessibility for people who are blind or have low vision?

In trying to help answer this question I have developed an accessibility software library for the Unreal Engine 4 game engine. This blog post shows the library in action and describes its capabilities.

The main feature of the library is to enable developers to easily provide automated audio cues and descriptions of a 3D scene in front of the user to support their orientation and mobility. The library focuses on scans of the elements (characters and objects) that are in front of the user. Additional features include: first-person view capability, collision prediction, game and area specific audio instructions and slow motion. All of these features are demonstrated in action in the videos below. Accessible Realities library is currently in Alpha stage.

Some of Accessible Realities’ features could potentially be reused in other non-game environments, for example in real world narrators like the Microsoft Seeing AI application (see Demo 4 video for details). Other quite intriguing future opportunities are listed in the Future section. My hope is that this library, after additional testing, combined with many other libraries (like ones that enable accessible menus) and best practices would help provide better video game and XR experiences for people who are disabled and specifically who are blind or have low vision.

Accessible Realities at its current state is a result of an intensive one year of full-time research and development. It was created without any external funding with my own savings. After 20 years in the high tech industry I decided to take time to contribute to the community via this social impact project.

Game accessibility for the blind is quite a challenge, I am trying hard to improve the library all the time and your helpful feedback is very welcome. I wanted to have something to show that is more than an idea, something one can hear and get a feel of, one that a tester could try out instead of waving my hands without a concrete implementation. The library is currently at Alpha stage waiting for additional testers and more games. I thought that a year into the project and around the time of GDC, GAConf and CSUN 2018 conferences would be a good opportunity to share its current status with the community.

The goals of this post are to:

Share info with the community so others can evaluate, use and build and improve upon it
Call for people who are blind or have low vision to join Alpha testing
Call for video game companies and developers to join Alpha testing
Ask for feedback from game accessibility experts
Call for collaborations with social impact intentions (surprise me!)

I will be happy to hear from you: either via Twitter at @AccessibleXR or via this online form.

So let’s hear it in action…

Demo 1: Platformer Game

Platformer game integration: adding a radar that scans which objects are in front of the player.

This is Epic Games’ Platformer sample game. Integration took around 1.5 days. Note that playing successfully still requires a lot of skill. Clicking immediately as you hear the sound does not guarantee success in the game.

Demo 2: Library Walkthrough in 3D Fighting Game

This video walks through the capabilities of the library one by one. It is demonstrated with Epic Games’ Couch Knights sample multiplayer 3D fighting game.

Let’s go through the features in the demo:

The main functionality of Accessible Realities is to provide locational and instructional accessible audio.

3D Scene Scans (0:33, 1:37): as heard in the video the library enables the developer to easily provide automated audio-based runtime descriptions of the scene in front of the user to support their orientation and mobility. For cases where high accuracy is required there is a left-to-right scene elements 3D Scan that provides for each element, one by one, their detailed location. For example: “Table: far-left, mid-height, near; Couch: center, up, very-far”. Similarly the library includes another feature (not shown in the video) in which a natural language description is constructed along the lines of “Table is the leftmost element, further to the right and more distant there is a couch which is the most distant element, further to the right there is knight #1 directly in front of you, which is the closest element”
Axis Scene Scans: for actions that require fast response times or for experienced users there are fast audio scans of the scene elements: first, there is horizontal left-to-right scan (0:45, 1:53). In addition there is vertical top-to-bottom scan and distance closest-element-to-most-distant scan (1:09). The scans use stereophonic sound to convey their relative left-to-right location
First Person View (1:16): the user can also choose to use first-person point of view capability. This could be a great help where third-person or top-down views are too detailed or hard to imagine
Audio Instructions (0:17): can be made available for the entire duration of the game for the user by a single keystroke. Current granularity is: Game Instructions, Accessibility Instructions that describe accessibility specific features and Scene Instructions that require a bit more work to set up but provide the user with static audio descriptions of the area they are in (in the videos you can hear areas such as: “on table”, “on floor”, “on couch” etc.)
Radar Zoom (2:17): when there are many elements in the scene the user can focus using the radar’s zoom capability to zoom in and out. In addition, the developer at runtime can change the active radar tags (which elements are described for the user) for example based on the area the player is in
Collision Prediction (2:02): in many cases the user wants to know what they will bump into if they go straight ahead.
Slow Motion (2:42) control can enable the user to slow down the game in cases of cognitive overload
Audio Speed (1:04) and Volume controls are available.
Choice: the game developer has full control with regard to which features they want to enable for their game

Demo 3: 3D fighting Game

The following video “speaks” for itself.

Demo 4: Real World Audio Augmentation

What we see in the video below is a non game real world use case of Accessible Realities. The object recognition tags in the video are created by artificial intelligence in the cloud. Note: part of the video is played at 2x normal speed. It is recommended to watch this video with Closed Captions on (click on the CC icon in the video’s toolbar).

This is a video recording from a demo Android app I have developed using Accessible Realities library. Yet, this is just a proof of concept to demo the scene scan in the real world and not the main use case of the library. Real world audio description is a different problem domain than video games and should be carefully verified especially with regard to safety concerns. Nevertheless, adding mid-level audio description of all the elements in the scene one by one with their relative positions might be a great addition to complement other representation methods of visual content. It could complement methods such as low-level single element (like barcode or face) detection and high-level AI-based scene description. This capability might be useful as an additional channel to Microsoft Seeing AI or Soundscape mobile apps to enrich the experience of the user. Accessible Realities might also be very easily integrated with technologies that algorithmically scan a real world scene and translate it to one made of 3D models, as presented in this video by Resonai. These 3D models could include built-in tags and audio descriptions and so can immediately provide orientation and mobility information for people who are blind or have low vision. A more advanced usage could even provide AI-based navigation instructions for avoiding obstacles and reaching a target, something like a virtual guide dog.

To try out the effectiveness of this scene description method before implementing it, I asked a friend to carefully simulate the scan shown in the video while I was pointing a mobile phone with the camera on and trying (very carefully) to navigate to a certain object. It worked 🙂 .

Usage

Accessible Realities is an infrastructure and not a specific game. There is a very simple integration process that includes 5 main basic steps. These steps could be done in Unreal Engine 4 entirely using Blueprints without any coding, if desired:

Import the library
Drag and drop a single Blueprint object into the level
Tag your actors (either manually or programmatically)
Record (or use existing) audio assets to represent each of your tags
Override the default configuration parameters using a single key-value editor screen. One of the configuration parameters to edit is the dictionary that maps between actor tags and audio assets.

The library takes it from there…

Background and Motivation

Digital worlds, realities and games are becoming more and more part of our lives. Leveraging the high flexibility of the digital medium we should aim to provide disabled people with equal access to these worlds.

“there are 2.2 billion active gamers in the world” – Newzoo, 2017 report

“More than a billion people in the world today experience disability.” – World Health Organization

“An estimated 253 million people live with vision impairment: 36 million are blind and 217 million have moderate to severe vision impairment” – World Health Organization

“As well as the numbers making good business sense, there is human benefit. Games are entertainment, culture, socialising, things that mean the difference between existing and living.” – GameAccessibilityGuidelines.com

“Whenever a game adds an accessibility feature, it feels like it’s made just for me. If a game makes an attempt to reach out to me I am going to remember that for the rest of my life.” – Steve Saylor

The goal of this post is not to include a list of all available game accessibility resources but here are just a few links that can be useful if you want to learn more about the subject: many emotional stories, quotes and videos exist which are great inspiration and sources of knowledge and motivation for enabling accessibility in games. There are videos available discussing game accessibility for the blind and ones showing video games played by blind gamers. I have found Ian Hamilton and Adriane Kuzminski online resources especially useful for this project’s purposes.

Use Cases

As mentioned, the main functionality of Accessible Realities is to provide locational and instructional accessible audio. The library could be integrated with many types of games and applications and help in making them accessible for people who are blind or have low vision. For example: sidescrollers, interactive story games, point and click adventures, turn based strategy games, some 3D action games and more. It could serve as a research platform (as described in the Future section below). It could also be used for Virtual Reality and Mixed Reality where the computer generated content could be tagged or even for Augmented Reality with AI-based object recognition as shown in Demo 4 above.

In the game itself, the library could be used for many purposes:

Identification and navigation to 3D elements (both characters and objects) in space
Orientation and mobility in spaces like buildings using audio tags in places like doors, windows, corridors etc.
Orient and focus when there are a lot of elements using the radar zoom in feature
Exploration – use slow motion to enable the user to switch between regular play and slow or very slow motion for exploring the level
Very fast games or scenes could use slow motion
Instructions – game level instructions, instructions specific to an area and instructions describing the accessibility features. All available by one keystroke anytime
Player mood – use area specific scene descriptions for encouraging players as they proceed in the game
Translation to multiple languages by using multiple sets of audio recordings
Create multiple difficulty levels for blind gamers using different number and types of audio tags
Create 3D audio-only games

Quite a few of the guidelines from the comprehensive list at http://gameaccessibilityguidelines.com/ could be implemented this way. This could be a topic for a separate post in which a guideline is quoted and a short video demonstrates its realization using Accessible Realities.

Strengths

Semantic Content

One of the advantages of the library is its focus on semantic content. The developer tags the game content for what it is in terms familiar to the user (castle, chair etc.). In addition to tags, by integrating with the game engine the library has full knowledge of the game world that could be made accessible for the user: locations, velocities, colors, animation states (very powerful future feature) and more. The library currently does not scan pixels one by one which could take a lot of time and would require the user to identify the content. Nevertheless there are use cases where this type of scan can be useful (for example a very detailed and realistic scan of landscapes or art or pixel-by-pixel scan of the depth of the scene). This capability would require additional development but could fit well and easily added using the existing infrastructure provided by the library. I wrote about potential uses of semantic content in images back in a blog post in 2010. This project is sort of a follow-up to one of the ideas described in that post.

Hardware

No special hardware is required. There is no need for any special haptic device or even a mouse, only a keyboard is currently used. In cases where a keyboard is not accessible enough the standard keyboard replacements could fit. For mobile phones one could use special gestures or on-screen special buttons to trigger the scans or other actions.

Audio Recordings

Because of the lack of solid text to speech support in the game engine the library uses a practical approach of audio recordings. This is limited in some ways but could be very powerful in other ways like: reusing existing game assets instead of voice (like effects or short music clips), recording voice actors unique to the game for customized experiences or translation to other languages that are not well supported.

Weaknesses

Text to Speech

Arbitrary text like score, health and time indication for example, are currently not supported. There are advantages for using recorded audio like mentioned above but it has also major limitations. The main reason for not having text to speech is the lack of solid universal support for it in the game engine. Text to speech in game engines is a challenge all by itself (even though some options do exist).

Testing

The library was designed in such a way that it can be easily modified. So once feedback from testers was given it took a short time to incorporate it into the library. But I must say it was quite difficult to find testers with low vision and it took a lot of time and effort. New testers who are blind or have low vision for additional testing are very welcome to join the Alpha. This is one of the main goals of this blog post.

UI

Accessibility for UI (UMG in Unreal Engine language) including menu items is not supported. An example for a plugin that supports this functionality (for Unity) is: UI Accessibility Plugin by MetalPop.

Future

While working on the project I have thought about a few future opportunities. Here is a list of some of them. I will be happy to discuss these and other collaborations.

Game Engine Built-In Accessibility

It would be a great advancement for accessibility in games and XR if the game engines themselves would provide built-in accessibility features like colorblind modes, text to speech, accessible subtitles library, accessible UI components etc. Hopefully, the new colorblind mode from Epic Games which got a lot of excited comments is a good sign for things to come. And even better would be to develop a standard, for example an addition to OpenXR standard that would handle accessibility in the same way for all major game engines and platforms. Personally, I would be more than happy to contribute to either Epic Game’s Unreal Engine directly (via a coordinated Pull Request) or to work with many others, more capable than me, towards designing and implementing a standard accessibility solution.

Research

The features shown above are just one possible implementation. The data that is gathered by the library like an object’s identity, coordinates, etc. could be represented in many other ways as well. Off-the-shelf game engines could be a fantastic research environment with many advantages to try out and experiment with different approaches for accessibility.

XR Semantic Object Model

In the same way that HTML has Document Object Model, there could be advantages for developing a cross-platform semantic object model for Virtual and Real World Realities. One of the main advantages could be making this model accessible to tools such as screen readers.

Audio-based Interfaces

“The number of mobile phone users in the world is expected to pass the five billion mark by 2019. […] the number of smartphone users in the world is expected to reach 2.7 billion by 2019.”

– statista.com

Unique opportunities exist by specifically solving the challenge of game accessibility for the blind. For example, full or partial solution would (with additional work of course) open up the market of more than 2 billion (!) basic mobile phone (non-smartphone) users to the game industry via audio-only cloud-based interfaces (“dial to play” style). Many of these users are located in developing countries. Less demands for CPU and GPU also mean games could have an option to run on cheaper computers. Audio-only interfaces could be integrated also to smartphones and voice-based digital assistants. Audio mode for a game should not be seen as a replacement for high end graphics display but it could complement it for certain use cases. Many times providing accessibility to one specific group benefits a lot of other groups as well.

Call for Action

Accessible Realities is currently in Alpha stage. If you are a person who is blind or have low vision or a video game company or developer I invite you to join the Alpha of Accessible Realities.

In addition, feedback from game accessibility experts is very welcome. Accessibility for the blind is a complex task, I try hard to improve and your helpful advice is always welcome.

To join the Alpha or a future Beta or to provide feedback or for any other collaboration, I’m available for you on Twitter at @AccessibleXR and via this online form.

Finally, Thank You and Credits

Many people have helped and contributed a lot to this project so I will take this opportunity to say thank you. If the result is still a work in progress that is my full responsibility. It’s because of myself and my decisions (and I would hope also the non trivial nature of the challenge 😉 ) and not in any way because of any of the extremely helpful people below. So without further ado, a very big “Thank You!“ to:

Orit, my wife, for her huge support on so many levels.

To my family for their great support and continuous feedback – to my parents Ilana and Yosef, my sister Lilach and her husband Momi and the rest of the Shuti crew Dror, Eyal and Yoav. To Sarah and Hagit, Barak, Yonatan, Amit and Rotem. To my uncles, my aunts and their families and finally, to my late grandmothers who provided me with inspiration for this social impact project.

Migdal Or organization – very big thank you to Rita, Doron and Gabi – for contributing a lot of very valuable feedback from their vast experience in the field.

Ofir – for testing and providing valuable feedback.

Ian Hamilton and Adriane Kuzminski for their great online resources that they provide on game accessibility and specifically on accessibility for the blind community. Special “Thank You!” to Adriane for her awesome feedback and comments on an early draft of this post.

A very big thank you for their great advice, testing, words of wisdom and encouragement to the following friends, colleagues and awesome people: Moti, Barak, Amir, Adi, Eylon, Iris, Hilla, Roi, Yochai, Dror, Sharon, Dr. Lahav, Eyal, Amir, Omer, Ori, Michael, Ilan, Eldad.

And finally, a special thanks to Arie Croitoru and Amir Green and the Unreal Engine Israel Meetup group members.

Thank you all, the progress made could not have been done without you!

Zohar (@AccessibleXR )