Minggu, 15 Juli 2018

Sponsored Links

In English | Poluttamo
src: poluttamo.files.wordpress.com

Multimodal interactions provide users with multiple modes of interaction with the system. The multimodal interface provides several different tools for input and output data. For example, the multimodal question answering system uses many modalities (such as text and photos) at the level of questions (inputs) and answers (outputs).


Video Multimodal interaction



Introduction

Multimodal human-computer interaction refers to "interaction with virtual and physical environments through natural mode of communication". This implies that multimodal interactions allow for more free and natural communication, connecting users to automated systems in both input and output. In particular, multimodal systems can offer a flexible, efficient and usable environment that allows users to interact via input modalities, such as speech, handwriting, hand gestures and gazes, and to receive information by the system through output modalities, such as speech synthesis, graphics smart and other modalities, combined in a favorable way. Then multimodal systems must recognize the inputs of different modalities that incorporate them according to temporal and contextual constraints to enable their interpretation. This process is known as multimodal fusion, and it is the object of several research works from the nineties to the present. The combined inputs are interpreted by the system. Naturalness and flexibility can produce more than one interpretation for each different modalities (channels) and for their simultaneous use, and they can consequently result in multimodal ambiguity generally due to inaccuracy, sound or other similar factors. To solve ambiguity, several methods have been proposed. Finally the system returns to the user output through various modal channels (disaggregated) arranged in accordance with consistent (fission) feedback. The use of mobile devices, sensors and web technologies that permeate can offer adequate computing resources to manage the complexities implied by multimodal interactions. "Using clouds to involve shared computing resources in managing the complexity of multimodal interactions is an opportunity.In fact, cloud computing enables the delivery of scalable, mutually configurable computing resources that can be dynamically and automatically defined and released."

Maps Multimodal interaction



Multimodal input

Two main groups of multimodal interfaces have joined, which are related in alternative input methods and others in the input/output combination. The first group of interfaces combines various user input modes beyond the traditional keyboard and mouse input/output, such as speech, pen, touch, manual gestures, gaze and head and body movements. This most common interface combines visual modalities (eg display, keyboard, and mouse) with voice modality (speech recognition for input, speech synthesis, and audio recording for output). But other modalities, such as pen-based input or haptic input/output can be used. Multimodal user interface is a research area in human-computer interaction (HCI).

The advantage of various input modalities is increased usability: the weakness of one modality is offset by another. On mobile devices with a small interface and keypad, words may be difficult to type but very easy to say (eg Poughkeepsie). Consider how you will access and search through digital media catalogs from this same device or set top box. And in one real-world example, patient information in an operating room environment is verbally accessed by members of the surgical team to maintain an antiseptic environment, and is presented in real time aurally and visually to maximize understanding.

Multimodal input user interface has implications for accessibility. Well-designed multimodal applications can be used by people with a variety of annoyances. Blind users are dependent on the voice modality with some keypad input. Users with hearing loss rely on visual modalities with some speech input. Other users will be "situationally distorted" (eg wearing gloves in very noisy environments, driving, or needing to enter credit card numbers in public places) and will only use the appropriate modalities as desired. On the other hand, multimodal applications that require users to be able to operate all modality are poorly designed.

The most common form of input multimodality in the market uses the XHTML Voice markup language (aka XV), an open specification developed by IBM, Motorola, and Opera Software. X V is currently being considered by the W3C and incorporates several W3C Recommendations including XHTML for visual markup, VoiceXML for voice markup, and XML Events, a standard for integrating XML language. Multimodal browsers that support X V include IBM WebSphere Everyplace Multimodal Environment, Opera for Linux and Embedded Windows, and ACCESS NetFront System for Windows Mobile. To develop multimodal applications, software developers can use software development tools, such as the IBM WebSphere Multimodal Toolkit, based on the open source Eclipse framework, which includes the X V debugger, the editor, and the simulator.

Defining Multimodal Interactions: One Size Does Not Fit All ...
src: i.ytimg.com


Multimodal input and output

The second group of multimodal systems presents users with multimodal multimedia and output views, especially in the form of visual and auditory cues. The interface designers have also begun to take advantage of other modalities, such as touch and smell. The proposed advantages of multimodal output systems include synergy and redundancy. The information presented through several modalities is combined and refers to various aspects of the same process. The use of multiple modalities to process exactly the same information provides an increased bandwidth of information transfer. Currently, multimodal output is used primarily to improve mapping between communications and content media and to support attention management in data-rich environments where operators face considerable demands for visual attention.

An important step in the design of multimodal interfaces is the creation of a natural mapping between modalities and information and tasks. The auditory channel is different from the vision in some aspects. This is omnidirection, temporary and always protected. Voice output, one form of hearing information, gets considerable attention. Some guidelines have been developed for speech use. Michaelis and Wiggins (1982) suggested that speech output should be used for simple short messages that would not be referenced later. It is also recommended that speech should be generated in time and require immediate response.

The feeling of touch was first used as a medium of communication in the late 1950s. Not only promising but also unique communication channels. In contrast to sight and hearing, the two traditional senses used in HCI, the sense of touch are proximal: it senses the object in contact with the body, and is bidirectonal because it supports perception and acts on the environment.

Examples of hearing feedback include auditory icons in the computer's operating system that indicate user actions (eg deleting files, opening folders, errors), speech output to present navigation guides in the vehicle, and speech output for warning pilots on the modern aircraft cockpit. Examples of tactile signals include turn signal lever vibrations to alert car drivers in their blind spots, automatic seat vibrations as a warning to drivers, and crutches on modern airplanes that alert pilots to upcoming kiosks.

An invisible interface space becomes available using sensor technology. Infrared, ultrasound and cameras are all now commonly used. Interfacing transparency with enhanced content provides direct and direct links through meaningful mapping, so users have direct and direct feedback for content input and responses to the appropriate interface (Gibson 1979).

InfoVis 2017: Orko: Facilitating Multimodal Interaction for Visual ...
src: i.vimeocdn.com


Multimodal fusion

The process of integrating information from various input modalities and incorporating them into complete commands is referred to as multimodal fusion. In the literature, three main approaches to the fusion process have been proposed, corresponding to the major architectural level (acknowledgment and decision) in which a combination of input signals can be performed: recognition based, decision-based, and multi-level fusion hybrids.

Fusion-based introduction (also known as early fusion) consists in combining the results of each capital recognizer by using an integration mechanism, such as, for example, statistical integration techniques, agent theory, hidden Markov models, neural networks, etc. Examples of recognition-based fusion strategies are action frames, input vectors and slots.

The decision-based integration (also known as final fusion) combines semantic information extracted by using a dialog-specific fusion-specific procedure to produce a complete interpretation. Examples of decision-based fusion strategies are typed feature structures, melting pots, semantic frames, and time-labeled grids.

Potential applications for multimodal fusion include learning environment, consumer relationships, security/surveillance, computer animation, etc. Individually, modes are easy to define, but difficulties arise because technology considers them a combined mix. Difficult for algorithms to factor in dimensions; there are variables beyond the current calculation capability. For example, the semantic meaning: two sentences can have the same lexical meaning but different emotional information.

In hybrid multi-level fusion, integration of input modalities is distributed between recognition and decision level. The hybrid multi-level combination includes the following three methodologies: state-to-transducers, multimodal grammar and dialogue movements.

ABBI: Audio Bracelet for Blind Interaction | Multimodal ...
src: mig.dcs.gla.ac.uk


Ambiguity

User actions or commands generate multimodal inputs (multimodal messages), which the system must interpret. Multimodal messaging is a medium that allows communication between users and multimodal systems. This is derived by combining the information conveyed through several modalities by considering the various types of cooperation between several modalities, the time relationship between the modalities involved and the relationship between the pieces of information linked to this modality.

The natural mapping between multimodal inputs, provided by some interaction modalities (visual and auditory channels and sense of touch), and information and tasks implies to manage typical human-communication issues, such as ambiguity. Ambiguity arises when more than one input interpretation is possible. A multimodal ambiguity appears good, if the element, provided by one modality, has more than one interpretation (ambiguity is propagated at the multimodal level), and/or if elements, connected with each modality, are univocally interpreted, but the information is called modalities different ones are confusing at the syntactic or semantic level (ie multimodal sentences have different meanings or different syntactic structures).

In "The Management of Ambiguities", methods to solve ambiguity and to provide a correct interpretation of user input are organized into three main classes: prevention, posterior resolution and approximate approximation approach.

Prevention methods force users to follow predetermined interaction behaviors according to a series of transitions between different allowed interaction states. Examples of prevention methods are: procedural methods, reduction of the expressive power of grammar, improvement of the expressive power of language grammar.

Resolution of posterior ambiguity using a mediation approach. Examples of mediation techniques are: repetition, e.g. repetition with modalities, breakdown and undo details, and options.

The approximate resolution method does not require user involvement in the disambiguation process. They can all use some theory, such as fuzzy logic, Markov random field, Bayesian network and hidden Markov model.

Raffaella Bernardi - Language & Multimodal Interaction track (LMI ...
src: i.ytimg.com


See also

  • Device independence
  • Modality (human-computer interaction)
  • Speech recognition
  • W3C Multimodal Interaction Activity - an initiative of the W3C that aims to provide the means (mostly XML) to support the Multimodal Interaction scenario on the Web.
  • Web accessibility
  • Wired gloves
  • XHTML Voice

Multimodal Interaction Group | HCI Research at the University of ...
src: mig.dcs.gla.ac.uk


References


Toward a Multimodal Neuroprotective Treatment of Stroke | Stroke
src: stroke.ahajournals.org


External links

  • W3C Multimodal Interaction Activity
  • XHTML Profile Votes 1.0, W3C Notes December 21, 2001
  • Hoste, Lode, Dumas, Bruno, and Signer, Beat: Mudra: Interim Multimodal Interaction Framework , In 13th International Conference Process on Multimodal Interaction (ICMI 2011), Alicante, Spain, November 2011.
  • Toselli, Alejandro HÃÆ' Â © ctor, Vidal, Enrique, Casacuberta, Francisco: Introduction of Interactive Multimodal Patterns and Applications , Springer, 2011.

Source of the article : Wikipedia

Comments
0 Comments