Sound Control of the PC (summary from the pilot project)

Can people who have difficulties creating and articulating speech, use sounds to control technology? The pilot project Sound control looked at this topic and considered the possibilities for a development project.

Story by: Morten Tollefsen - 12.01.2011

MediaLT received several questions from communities and users about how people with speech disabilities could use sounds and / or indistinct language to control technology. We found little previous research or experience relating to this, and took the initiative for a pilot project on Sound Control. IT Funk supported the project which ended on December 31st 2010.

Goals, objectives and target group

The main objective of the project was to:

"Examine the possibilities for sound control of the PC, and to lay the foundation for a development project if possible and appropriate."

Milestones:

1. Analyze international R & D status in the field of sound control.
2. Define a functionality matrix for sound control of the PC.
3. Determine basic profiles with key functionality.
4. Evaluate possible technologies for sound control.
5. Establish international cooperation.
6. Lay the foundation for a main project if appropriate.

The main target group of the project was people with speech difficulties and problems using the PC with a standard user interface. In other words, the target group included all people who can benefit from sound control as a substitute for or in combination with other forms of interaction.

Brief description of project results

Analyze international R & D: References are collected and published on the project website:
http://www.medialt.no/lenkerreferanser/847.aspx 

Furthermore, the R&D situation is summarized in a separate status report:
http://www.medialt.no/statusrapport/1001.aspx  (Norwegian only)

Define a functionality matrix for sound control of the PC: Personas were used as a method to identify the functionality typically sought after in a sound control system. Personas are detailed descriptions of fictitious persons, which serve as good examples of the characteristics of the user group. Based on knowledge from Sikte, Sunnaas and the Cerebral Palsy Association, four personas were created. It became clear from this work that a wide range of accessible functionality was desirable, moreover that the users would have very different needs and assumptions and adaptation at the individual level should be possible. This is consistent with other research findings.

Our assessment is that definition of a functionality matrix is not appropriate and that the system should be developed in such a way as to ensure the greatest possible degree of individual customization.

Determine basic profiles with key functionality: At the start of the project we thought that we could define different "standard packages" of functions that could be controlled by sounds: for example, move the mouse pointer, click, drag and drop etc. During the project we saw that it would be better to develop a system that would as much as possible allow for individual customization. Based on the work with personas, the project group was agreed that the most desirable solution could be described as follows:

  • The user is given access to a large "toolbox" with different functions that the system is able to perform, an example of such a "toolbox" is the functionality used in the speech recognition product VOMOTE.
  • The user is able to use the sound / voice commands instead of or in addition to the PC control he / she currently uses i.e. the solution can be used with together with different switch solutions and eye control.
  • The user, together with specialists / guardians / contact persons, may choose which sounds / commands he / she will say to the PC.
  • The user, together with specialists / guardians / contact persons, may choose which functions are most appropriate and relate an audio / voice command to these.
  • The extent of the functionality depends on how many sounds the user can create. If he / she is able to make ten different sounds, then he / she may choose the ten most desirable functions and relate sounds to these.
  • If the user wishes new functions he / she may easily do this by relating one or more sounds to the new function (a previously selected fucntion would be deselected).

Evaluate possible technologies for sound control: One of the key questions relating to the value of a sound control system is how many sounds a typical user is able to make, and whether he / she will be able to reproduce these sounds well enough to allow the system to distinguish them. Since the user group is very heterogeneous, it is not possible to give a clear answer to this question, but the project group agreed to record the sounds of typical users to assess the technological possibilities for sound control.

We proceeded to make a survey of international work in this field, and during this work, Miriam Nes Begnum participated at the International Conference on Computers Helping People with Special Needs  (ICCHP) in July 2010. Contact was established with Foad Hamidi from the University of Yorku in Canada. Hamidi presented the paper: "CanSpeak: A Customizable Speech Interface for People with Dysarthric Speech". The recordings of Hamidi and his colleagues were considered sufficient for our purposes and our scheduled recordings were not carried out.

Testing at the University Yorku was done with four people and a vocabulary of 47 words. Without adaptation, vocabulary recognition was between 30 and 56%. Among people without speech difficulties, the result was 94%. With adaptation the rate of detection increased radically to 84.3%. The very best results were achieved when family, teachers, nursing staff or speech specialists were involved. Relying only on the user for definition of appropriate phrases gave minimal improvement. Individual adjustment of the system, using assessment of pronunciation difficulties by speech specialists, is valuable.

User testing in the SMUDI project showed that challenges associated with use of microphones was far greater than we had foreseen. A sound control system for those with speech disabilities would exacerbate this challenge even further, and we found it necessary to include microphone testing in the technological evaluation of the solution.

The testing clearly showed that there was a need for further work in this field, and a project on microphone and switch solutions was started on 1 September 2010:
In this project we tried to find an existing sound recognizer, but investigation nationally and internationally did not suggest an appropriate solution. 

Our conclusion is that: To create a solution for sound control a specific sound recognizer must be developed. We believe such a sound recognizer can be realized. Research is needed to measure the value in terms of the quality of recognition in the target group and the value of sound control.

Establish international cooperation: Based on our analysis of the present international R & D situation two communities emerged: Yorku University and Hearing Bridge. Cooperation was established with both of these communities; Yorku University as a research partner and Hørselsbroen as a development partner. A good foundation for further international cooperation has been laid.

Lay the foundation for a main project: In autumn 2010, most of the building blocks for a main project were in place, but signals from communities made us uncertain about how great the need actually is for sound control of technology. We decided therefore to conduct a targeted survey of communities and users. Based on the results of the survey, it was agreed by the project that there was currently no basis to proceed further with a main project, as the number of users who would benefit is relatively small.

R & D tasks and key communities

The project was led by MediaLT. In addition, the most important communities were NAV, Sikte, Frambu, University of Oslo, CP Association and Sunnaas special education center.

The key tasks of the project have been to analyze what has previously been done in this field, investigate the central R & D challenges, explore desired functionality, clarify technological challenges and opportunities and examine how great the need is.

Meaning / value of further work

This project has brought new knowledge about what this development would entail, which groups would benefit and how great the need is.

Professional communities, users and MediaLT will benefit from this knowledge in their future work with technology for the disabled.

The preliminary conclusion is that the target group is not large enough to justify larger investment in sound control, but this should be continuously reassessed. Continued research is interesting because we do not yet know what benefit the target group will have from such a solution. Furthermore, it is interesting to look at other applications for sound control, for example in relation to alerting the hearing impaired and communication with animals. If these application areas can be combined with sound control for people with speech difficulties, then such a wider application may justify a larger development project.

News archive