Usability and Accessibility Evaluation of the Upcoming Norwegian E-Vote Solution

Till Halbach, Kristin Skeide Fuglerud, Øystein Dale, Ivar Solheim, Trenton Schulz

Norwegian Computing Center

Abstract

During preparations for a partly electronic solution (called e-Vote/e-Valg) for the municipality elections in 2011, Norwegian authorities have a strong focus on universal design. As a consequence, the usability and accessibility (UA) of all prototypes had to be evaluated. Our technical tests regarding UA standard compliance, as well as personas and user tests reveal that ― despite a fair effort by the prototype providers ― many, rather basic universal design principles are either not fully understood or not prioritized for implementation by the solution providers. However, despite the technical implications, users in general show a positive attitude towards e-elections. A final e-voting solution with a high degree of UA is likely to strengthen democratic principles and lead to a higher participation of the population than currently experienced.

1. Introduction

The Norwegian authorities are preparing for e-voting trials as part of the municipality elections in 2011 (Kommunal- og regionaldepartementet, 2010). One objective of the on-going project is increased availability to voters, which is a logical consequence of the requirements concerning universal design of environments, as stipulated in the Norwegian Discrimination and Accessibility Act of January 2009. The project participants translated this requirement into a more detailed list of testing topics, such as WCAG (Web Content Accessibility Guidelines) conformance and usability tests.

In this paper, we share our experiences concerning the final round with evaluation of participating technical solutions/ prototypes. The paper is organized as follows. After the discussion of previous and related work, the prototype testing is detailed in the Prototype testing section, with separate sections on the evaluations and respective findings concerning technical matters, and personas and user testing. We finish with the conclusion and a brief outlook.

2. Related work

Researchers have argued that Internet voting could increase voter participation and help strengthen democracy because such a solution will make voting more accessible for large parts of the population. At the same time, there have been few examples to show that usability actually can have an influence on the ability of voters to vote as they wish. Problems with usability was a central issue in the controversy surrounding the United States presidential election in 2000 (Bederson, 2003). There also have been claims for re-elections in some municipalities in Finland after the last municipality voting (Felten, 2009).

Various surveys of citizens' attitudes to online voting show that people in fact are concerned about security and integrity of such systems (Kenski, 2005). These studies also points out that web based voting can change the social-demographic and ideological composition of the voters due to the fact that those who are most positive about such systems are typically young, well educated and liberal citizens (Kenski, 2005; Herrnson, 2006). Kenski, 2005 is also able to show that these users make fewer mistakes and need less help during the electronic voting process. Smith, 2008 conducted a survey among 100 fully-employed, (semi-)professional and well-educated Internet users. By means of hypothesis testing and statistical analysis, various factors that influence the willingness to vote online could be identified. The most important among these factors were ease of access and confidence in these technologies.

Although there is general agreement that usability is a very important aspect of online voting systems (Bederson, 2003; Herrnson, 2006; Conrad, 2009), there are few studies in this area, and even fewer when it comes to accessibility and usability for disabled and other vulnerable user groups. Bederson, 2003 reviews some studies in the area, but these refers primarily to voting machines at a public voting localization. In Conrad, 2009, it is argued that previous studies devote very little attention to usability. In the following study on the usability of some electronic voting systems, many usability issues could be observed. It is argued, though, that most of these problems are easy to fix provided that the manufacturers incorporate usability design and testing in the development process.

Little, 2008 has conducted focus group interviews to uncover the social aspects of the use of ubiquitous technology. Before the interview, participants received a video based on an adapted e-election scenario. Findings from this study show that electronic voting is influenced by trust, privacy and convenience, but also by other aspects such as context, type of device used, and individual factors for each user.

To summarize, voting technology and ballot design can influence

It is therefore crucial to study e-voting systems from different angles.

3. Prototype testing

A number of requirements was formulated in the e-Vote 2011 project to ensure that the final solution becomes as accessible and usable as possible. This resulted in a specification consisting of usability and accessibility (UA) requirements. For each requirement, one or several tests were defined that had to be passed to satisfy the relevant part in the requirements specification.

Testing was conducted in three different ways. Technical tests were tests which could basically be carried out by appropriate testing tools. Another part of the testing involving users was done by means of personas, while the last part consisted of testing with real users.

Two rounds with testing were conducted. In the first iteration, five prototypes from different providers were evaluated. In the second iteration, there was a shortlist consisting of only three, partly improved prototypes.

The assessment of the prototypes was aided by a credit point system, which was given by the decision makers, as explained by the following.

Excellent ― 3 points
Outstanding solution to the requirement.
Adequate ― 2 points
A plain pass.
Poor ― 1 point
Requirement only partly met.
Fail ― 0 points
Requirement entirely not met or not addressed at all.

The average score per testing topic ― WCAG, ELMER etc., as described below ― was determined as the weighted average of credit points of each single test. A prototype's overall score was then calculated as the average over the weighted testing topic average scores.



4. Technical evaluation

Partly automated testing tools were the key to a successful technical evaluation. The tools allowed to assess the degree of standard compliance of the prototypes with regard to

The tools used were various add-ons to the Firefox browser, like Web Developer, Contrast checker, and Firebug. The degree of WCAG conformance was controlled with Achecker, while HTML and CSS conformance were checked with W3C's validator tools. As described in the Personas and user evaluation section, all testing was based on scenarios.

4.1 Findings

We found that basic technologies like HTML and CSS appeared to be well understood and implemented. The same applied to JavaScript incompatibilities, with which none of the prototypes had problems.

One interesting result is that all prototypes had been designed with static page layout in mind. Given page widths ranging from 966 to 1008 pixels, none of the providers has obviously given thoughts to people accessing their solutions from devices with small screens, such as smartphones or netbooks. This is in strong contrast to the popularity those consumer products are experiencing in the market. In case a page extends the available space provided by the browser window, which in turn is limited by the screen resolution, content will be hidden and scrolling is needed to access it. This may confuse users, as discussed in the Findings section. It is also noted that zooming into pages will, with a static page layout, lead to wide pages and hence a higher need for horizontal scrolling.

Concerning WCAG, the requirements specification defines 82 different success criteria from all conformance levels, i.e. A, AA, and AAA. While this gives a theoretically maximum credit score of 164 points with the plain-pass condition, the prototypes ranged between 138 and 142 points. With other words, all solutions were at least 15% under a standard compliance of 100%, and have hence the potential for improvements. While this number may not seem to be significantly low, we would like to stress that the prototypes given were incomplete, in particular with regard to the lack of multimedia content, which remains scheduled for implementation.

Errors we view as particularly severe include the following.

Another interesting result is the use of particular web technologies. While neither Java applets nor plugins ― both known for their accessibility implications ― are deployed in the prototypes evaluated, HTTP cookies appears to be mandatory (i.e. without a proper fallback) with the majority of prototypes. Iframes is used once as an alternative technology to HTTP cookies. However, the use of both appears not to have any accessibility implications (the iframes are invisible). One prototype used XHR, also known as AJAX, extensively, which gave conflicts with certain assistive technology, as detailed in the Findings section.

ELMER is a collection of user interface guidelines to be followed by all Norwegian governmental forms on the Internet (Norwegian Ministry of Trade and Industry, 2007). A number of 36 check points have been tested to assess the degree of ELMER 2 conformance. This is below the number of check points in the ELMER specification but was necessary due to tight time constraints. We found that the best performing prototype achieved roughly 70% of the theoretically possible maximum credit score of 72 points with the plain-pass condition. As the performance of the other solutions were below 50%, we can say with confidence that ELMER is obviously not well understood, and that it is the technical recommendation for which the implementations were most incomplete. However, it can be discussed if the application of ELMER as a specification for the layout of online forms is appropriate for the process of e-election. This is also acknowledged by the e-Voting decision makers.

Regarding ELMER conformance, the most important issues were related to page structure, help text, and concluding messages.

5. Personas and user evaluation

Personas testing is a method where usability experts play the role of particular imaginary users with a well defined impairment in a well defined scenario, as detailed below.

Through previous projects we had done extensive user testing with various user groups, such as visually impaired, elderly, and people with cognitive disabilities. Based on our experiences we had developed six personas with different impairments (vision, hearing, movement, cognition, and a combination of impairments). One persona was foreign.

For each persona test all UA issues were registered and categorized. These issues were analyzed with the results from the user tests. We did not include ranking of the prototypes in the personas tests because this would possibly only reflect the opinion of the researchers performing the personas evaluations.

Three scenarios were defined for both personas and user tests, in which a user would walk through an entire election process in three difference ways: voting without changes in municipality election, county voting with additional person votes, and municipality voting with added persons from other election lists. To make the scenarios more realistic and at the same time test the requirement of interoperability (in a cross-browser and cross-platform manner), we tested the combinations of browsers and platforms as specified in Table 1.

Table 1: Overview of browser and platform combinations
Browser Platform
Windows Mac Linux
Internet Explorer x
Firefox x
Chrome x
Safari x
Opera x
Lynx x

Six tests with different personas and 15 user tests have been conducted. The users were recruited through different Norwegian non-governmental user organizations, such as the Norges Blindeforbund (Norwegian Association of the Blind and Partially Sighted), Dysleksiforbundet (Dyslexia Association), CP-foreningen (Cerebral Palsy Association), Funksjonshemmedes Fellesorganisasjon (Norwegian Federation of Organisations of Disabled People), and Senior Centers. They were able to do informed decisions as they were provided with brief information about aims of the project and testing.

Participants would take part in the accessibility and usability test at their preferred location e.g. at their home, work place, at Senior Center or another suitable location. They were encouraged to use their own or familiar PC and equipment such as assistive technology. The participants were given a monetary compensation for taking part. We had a budget for conducting 15 user tests. Because we got more than 15 volunteers, we were able to select participants with very varied background with regard to ability/disability, age, gender, ICT, and voting experience. All users were Norwegian. Impaired users made use of a variety of assistive technologies, such as head mouse, screen reader, hearing aid headset, braille display, screen magnifier, and others.

5.1 Testing setup

We conducted the user tests in the user's natural environment. For this purpose we used portable test equipment (camera on a tripod and notebook for the researcher).

We argue that testing in the user's own environment will provide the most realistic setting for an Internet election. Also, testing in the field has the potential to bring up a wider range of issues than a laboratory test. Especially when it comes to user testing with people using assistive technology, we consider field testing to be the by far best solution.

There are multiple reasons for this. First, there are many different types and versions of software, platforms, and setups with assistive technology. Each type of equipment typically has many possible settings and adjustment options optimized for each particular user. It is hence time consuming to achieve the same settings on lab equipment. Besides, sometimes the user does not know or remember exactly what settings they actually use, leading to a trial and error process to yield approximately the correct settings. Also, in some cases it is not possible for the user to use unfamiliar equipment and settings, and it will in all cases require adaptation and take attention away from the test application and scenario. By visiting the user, the tests will not be limited to the setups available in a test lab.

Second, many users will be reluctant to bring their own equipment to a test lab, at least those who mainly use stationary equipment. Traveling to a testing laboratory is likely perceived as a barrier in itself for many users, in particular for the less resourceful users. Thus, we believe that by visiting the user, the testing process itself is not only easier, but it also produces more reliable results.

Step 2 of 4, where candidates for the chosen party can be selected

Figure 1: Screenshot of the winning prototype's page for selection of candidates. This prototype version provided Norwegian as the only language.

5.2 Testing procedure and data collection

The researcher was seated next to the user and tried as far as possible to act as a silent observer. In the beginning, the participants were presented with an overview of the UA test procedure, and they were informed about their rights, such as anonymity, voluntarily, and the possibility for withdrawal at any point without further explanation. After that, the participants had to answer a few questions about their background (age, gender, occupation, ICT experience, voting experience, impairment, use of assistive technology, etc. Depending on voter experience, the users would be briefed in the voting procedure with regard to casting personal votes and adding candidates from other lists. If accepted by the user, the session was recorded on video.

The prototypes were presented to the different participants in different order. Each participant had to repeat the tasks specified in the respective scenarios ― as explained in the Personas and user evaluation Section ― with the different prototypes. While doing the tasks, the candidates were asked to think aloud in order to illustrate the particular task-solving activity.

The researcher would try to identify problems, concerns, bugs, and procedural errors and take notes on the participant's actions and comments. For each prototype, necessary information such as virtual username, password, voting decision and other were provided. In case the participant was unable to continue on their own, they would get hints from the researcher. This was then noted down. Both notes and video recordings were used when summarizing each user session. A fairly detailed set of minutes from each session was written, although not all parts of each session were transcribed in detail.

After the main testing procedure, the participant was asked to provide an honest opinion regarding UA of each prototype. While doing this, the user was encouraged to elaborate on the tasks and to explore the prototypes again. The participant was also provided with a sheet with screenshots of each prototype as an aid in the discussion. Figure 1 illustrates such a screenshot.

5.3 Findings

We were able to derive the following results from users' general comments.

Among the results of technical nature, we want to mention the following.

In addition to the issues mentioned above, the user and persona tests also revealed some UA issues related to too small resolutions and poor contrast of user interface elements, difficult content sequences and navigation matters, and poor screen reader usability and use of unfamiliar terms.

Since the number of prototypes to evaluate was small, (5 prototypes in the first and 3 in the second testing iteration), we asked the users to rank the solutions according to subjective preference. This gave a clear distribution of prototype votes for each ranking place, such that we were able to recommend a particular prototype to the e-Vote 2011 decision makers as the final solution.

6. Conclusion and outlook

We have conducted extensive testing of the prototype candidates for the upcoming e-Vote 2011 solution. The evaluation involved technical aspects, as well as personas and user testing.

Regarding technical matters, we conclude that all prototypes still have to fill the gaps towards full WCAG compliance. We also recommend to drop the requirement of ELMER compliance. The testing itself was a bit cumbersome, as the tools involved did not offer a high enough degree of automation. This is especially true for testing of ELMER. Also, in particular with XHR in mind, a document tree validator would be needed as the existing tools are unable to test dynamic documents. Hence, better testing tools are needed.

Considering personas and user testing, we can conclude that the majority of users has a positive attitude with regard to the upcoming e-election solution. People are willing to try the new technology and associate mainly advantages with it. It is therefore vital that the technical obstacles mentioned above do not become burdens during any election for any user. Otherwise, the public opinion might turn around and reject the introduction of such a system.

We are aware of the fact that the prototypes tested were incomplete. In fact, the assessment took place in the beginning of the implementation process to evaluate which prototype performed best of all prototypes with regard to the UA requirements put forward. Based on our evaluation, the provider of the final prototype was chosen in December 2009. It remains to be seen how future development versions of the solution fill the gaps towards the highest possible degree of accessibility and usability of the e-elections in 2011, and how these features are balanced against security and privacy issues.

References

Glossary