Improving Multimedia Access for the DisabledMorand Fachot, Communications Officer, International Electrotechnical Commission (IEC) | December 06, 2016
(Editor's note: This article is part of a series of articles on issues related to active assisted living and standards designed to help guide engineers and allied professionals. Future articles will look at wearables, in-home medical devices and sporting equipment for the disabled.)
Multimedia content, particularly on television, and information technology and communication (ICT) services are central to our lives.
Access to these information sources for people with visual or hearing impairment is important and is an internationally-recognized right. The International Electrotechnical Commission (IEC), together with other organizations, works to develop international standards that allow access, which is central to the idea of Active Assisted Living (AAL).
Hundreds of Millions Affected
The World Health Organization (WHO) estimates (2014) that more than 285 million people suffer from visual impairment; more than 39 million are blind and 246 million have low vision. About 90% of the world's visually impaired live in low-income settings.
The WHO also estimates that more than 5% of the world’s population – some 360 million people – has disabling hearing loss (including 328 million adults and 32 million children). The majority of people with disabling hearing loss live in low- and middle-income countries.
As might be expected, aging is a contributing factor to both visual and hearing impairment (82% of people living with blindness are aged 50 and above). Visual and hearing impairments have impacts on personal, emotional, social, societal and economic levels.
In particular, individuals suffering from these impairments have difficulties communicating and interacting with their peers. This can lead to feelings of loneliness, isolation, and frustration, particularly among older people, according to the WHO. The academic performance of the visually and hearing-impaired, along with their employment prospects, are adversely affected, often forcing them into lower-paying jobs. The problems are often more severe in low-income countries and settings.
Internationally Recognized Right
Article 9 of the 2006 United Nations Convention on the Rights of Persons with Disabilities, which deals with accessibility issues, says that “States Parties shall take appropriate measures to ensure to persons with disabilities access, on an equal basis with others, to (…) information and communications, including information and communications technologies and systems.”
Provisions for accessibility to and usability of many information and communications technology (ICT) products and services are incorporated into national lawmaking in some countries. These include accessibility features such as subtitling, signing, or audio-description for people with sensory disabilities. In many countries and regions, broadcasters are required to provide universal access to audiovisual content.
The IEC is actively developing international standards for AAL in a range of areas.
To achieve this, IEC Technical Committee (TC) 100: Audio, video and multimedia systems and equipment, and several of its Technical Areas (TAs), have developed a number of specific international standards.
However, in 2014, TC 100 found it needed to create a dedicated TA, TA 16: Active Assisted Living (AAL), accessibility and user interfaces, to “develop international publications addressing aspects of active assisted living (AAL), accessibility, usability and specific user interfaces related to audio, video and multimedia systems and equipment within the scope of TC 100.”
(Read this article on TA 16 in e-tech, December 2014.)
TA 16 is currently:
- Developing Edition 2 of IEC 62731:2013, Text to Speech for Television – General Requirements
- Finalizing IEC 62944 Ed. 1.0, Digital Television Accessibility – Functional specifications (publication was expected at the end November 2016)
- Finalizing IEC 63080 Ed. 1.0, Accessibility terms and definitions (publication was expected in early 2017).
The IEC Standardization Management Board (SMB) established a strategy group, SG 5: Ambient Assisted Living (AAL), in 2011. SG 5 was later transformed into SEG 3, a Systems Evaluation Group on AAL. Following a recommendation by SEG 3, the SMB agreed to disband SEG 3 and to create a systems committee, IEC SyC AAL: Active Assisted Living (AAL), to help users of all ages live a meaningful, active and independent life.
Use cases related to accessibility have been collected in IEC SyC AAL as well as in the TC 100 study session on wearable technologies.
Access to broadcast content for the visually impaired is based predominantly on audio solutions. For TV broadcasts, this can be done through audio description of the on-screen setting/action that complements the audio content already available (for example, dialogue).
Where radio speech content is concerned, the elderly may have difficulty separating the narration from background music and sound effects and then understanding it. This results from the degradation of inner ear function as well as from the deterioration of processing ability in the auditory center.
Japan’s public broadcaster NHK has developed an adaptive speech rate conversion technology using speech interval detection. Output speech can be delivered more slowly than the speech originally input by carrying out a series of processes that delete non-speech intervals and scale the speed while ensuring that the target length remains the same and that the pitch is not affected.
Audiobooks, first introduced on long-playing records in the 1930s, were aimed initially to give access to printed works to the visually impaired. They later adopted audiocassettes as primary support.
In the 1990s, audiobooks moved gradually from an analogue to a digital format (CD). The need to define the audiobook electronic file format structure to ensure compatibility with music industry and multimedia standards, as well as how to present and navigate an audiobook effectively, led TA 10: Multimedia e-publishing and e-book technologies to develop IEC 62571:2011, Digital audiobook file format and player requirements. This standard "defines requirements and provides recommendations to publishers, software developers, content providers, and hardware manufacturers for the data structure, usability requirements, playback systems and delivery systems for audiobooks in digital file format."
Access to ICT products and services for the visually impaired can be ensured through text enlarged via adjustable fonts and magnification or the conversion of written material into spoken text using optical character recognition (OCR) software.
International standards for OCR are being developed by ISO/IEC JTC 1/SC 31: Automatic identification and data capture techniques, a Subcommittee of the Joint Technical Committee for Information Technology set up by the IEC and the International Organization for Standardization (ISO).
At the 2016 International Broadcasting Convention (IBC), one of Europe's largest professional broadcast exhibitions, a number of R&D departments from public broadcasters, universities, and telecommunications companies presented solutions aimed at providing access to multimedia and ICT products and services for people suffering from impairment linked to hearing, visual, and aging disabilities.
These solutions face challenges linked to the nature of the content, such as live or recorded broadcasts or archived material (analogue or digital, with or without metadata), as well as the language and/or writing structure and broadcasting system/format.
A Complex Process
Most people are used to seeing subtitles in films or in recorded television interviews when spoken words may be difficult to understand. Subtitlers of television programs face different kinds of challenges when the result is intended for live or for pre-recorded broadcasts, for large collections of video clips, or for a combination of subtitling and sign language.
These challenges and the solutions to address them were discussed by researchers from Ericsson and the University of Edinburgh (UK), BBC Research & Development and NHK at an IBC 2016 session on Novel Technologies for Assisting Sensory-Impaired Viewers.
Latency remains one of the most significant factors in the audience’s perception of quality in live-originated TV captions for the deaf and hard of hearing, according to joint Ericsson – University of Edinburgh research. Once all prepared script material has been shared between the production team and captioners, “pre-recorded video content remains a significant challenge – particularly ‘packages’ for transmission as part of a news broadcast,” says Ericsson’s Matt Simpson. These video clips are usually published just prior to, or even during, their intended broadcasting slot, providing little opportunity for thorough preparation.
Automatic speech recognition (ASR) based on context-tuned models and the application of machine learning across large volumes of data help meet some of the challenges. However, other issues still need improving, such as fidelity to the original spoken word, the textual accuracy of the transcript (with optimal accuracy being 95% and minimal threshold found to be 90%, with no more than 10% of the content missing) and the timeliness with which it is presented. ASR is set to play a growing role in support of captioning for live broadcasts, but the audio quality of the original video content (quiet or noisy background) is important.
Re-using Archived Content
Broadcasters hold large archives of material produced years ago. This content is not always subtitled, but is often rebroadcast as viewers like to discover or rewatch classics, comedies, or history programs.
Mike Armstrong from BBC R&D told conference attendees that the BBC provides subtitles for 100% of its TV programs on all its main channels as well as on its video-on-demand (VOD) service and websites. Recent BBC audience research shows that subtitle use was not limited to the hearing impaired but that around 10% of the adult TV audience use subtitles daily. Overall, subtitle use is around 18% and even as high as 20% on tablets. Most interesting are the findings for children’s programs, where subtitle usage is around 30% and around 35% for content classified as “Learning.”
The BBC has thousands of hours of video content and until now subtitling has been a manual process, done either by retrieving subtitles from original content or by creating new ones.
The BBC tested a three-step system for video clips (not for full-length program). It used 500 hours of content from some 7,500 audio and video files of the BBC Bitesize archives to assess automation of the subtitling process. This system includes:
· Identifying the source program and retrieving assets using the BBC’s Programme Information Platform (PIPs) metadata system and off-air (Redux) archives, locating the relevant section of the program by searching within programs and creating search strings;
· Matching audio and text using “Chromaprint” open source audio fingerprinting;
· Retiming subtitles and verifying output.
Trials resulted in a 46.7% success rate. Even if a significant amount of work is still needed to obtain a broadcast-ready product, this experimental project is promising and paves the way for the production of subtitles files for video clips.
At the same IBC 2016 session, Shuichi Umeda from NHK outlined the particular challenges faced by the Japanese broadcaster in offering services for hearing-impaired viewers. NHK is developing a system for computer generation (CG) of Japanese Sign Language (JSL) graphics, currently being tested with online meteorological information.
Persons whose first language is JSL, which is a different language from Japanese, have been demanding more TV program with sign language in addition to closed captions, as they may not be fully familiar with Japanese characters.
As sign-language interpreters are not available in sufficient numbers, CG production of JSL graphics is seen as a possible solution. However, it must be capable of generating realistic avatars (characters) that can reproduce facial expressions as well as hand signs.
NHK production of CG JSL graphics is based on templates using fixed phrases translated into strings of sign language animations, on 3-D models of characters, and on optical motion capture of markers attached to the joints and faces of signers.
NHK is currently testing this automatic system to generate weather forecasts for all of the 47 prefectural capitals of Japan so that users can see the latest weather forecast via the Internet for any of these cities in the form of CG sign language.
CG of JSL graphics for weather reports is a relatively simple process as it relies on the set sentences, phrases, and signs most commonly used in weather reports. The same does not yet apply to most other forms of Japanese TV content, although NHK has announced plans to provide these CG descriptions for the 2020 Tokyo Olympics by generating them automatically from Olympic Data Feed messages, which are play-by-play event records.
The IEC is also liaising and working with other international and professional organizations like the International Telecommunication Union (ITU) or the European Broadcasting Union (EBU), which work on developing solutions to provide access to broadcast and ICT products and services to people suffering from visual or hearing impairment. IEC TC 100 maintains a Category A Liaison with both.
As for IEC SyC AAL, it maintains a Category A Liaison with ITU-T/JCA-AHF: Joint Coordination Activity on Accessibility and Human Factors.
Work by the IEC, these organizations, and others improves access to multimedia content significantly for persons with visual or hearing impairments.