What I learned from a year of usability testing

When I became Design Lead of the Gutenberg block editor for WordPress, I wanted to ensure our communication with the open source community was strong by encouraging others to voice their opinions and feel confident that the team was listening to their feedback. The first introduction of Gutenberg did not go as well as people imagined, so we spent a good year improving how we communicated our progress and direction. Through that time, I helped conduct moderated tests with colleagues to explore how small business owners used WordPress, and I wrote and cataloged many unmoderated tests with new users to understand their expectations.

Moderated testing

This was, by far, the way to go for usability testing. Sarah Semark and Sarah James worked to establish a group of people to interview and observe as they clicked through prototypes. The testing was segmented into three groups: bloggers, small businesses, and site builders. Two things stood out for each of the segments.

  1. Not enough time to effectively do everything they wanted to with their site.
  2. Anything new had to work within already established processes.

With these in mind, we set out to build an interface that would improve their experience without relying on a difficult learning curve. This meant that although we were building a new interface for WordPress, we needed to allow their current flows of writing content and building patterns to remain. The editor needed to be both a layout platform built with blocks AND provide a clean canvas for those who just wanted to type words.

We struggled early on to do both. The UI was overwhelming for bloggers, but conducive for site builders. Finding that balance is still happening today. Introducing modes into the UI has helped alleviate this in addition to pulling back the amount of UI that defined the blocks.

Build a robust prototype – Prototypes were designed and built in Figma. Interviews were scheduled, and we connected through video conferencing software. After some brief explanation of the usability test and its purpose, we pulled up the prototype. Immediately, it was obvious that there were some who liked clicking on everything, and others that picked through the prototype with clear intention and defined tasks.

For those who clicked around, they would become lost and ended up requiring more guidance. The solution to this problem was to build more robust prototypes. All links within the interface needed to be clickable to eliminate any perceived broken elements. Every pathway that a person might venture down needed a way to return to the initial screen. Once these two improvements were implemented, it became much easier to help guide the clickers through the prototype.

The pickers, the ones who relied on predefined tasks, were much easier to work with. They waited for instruction from the moderator and then went about to accomplish their goal. But because they relied heavily on the moderator, the moderator needed to be sure they were not leading the tester down a direction of their own desire. It was important to let the picker figure out how to solve a task on their own.

Allow time for design iterations – Every interview revealed a new design problem. We quickly found that scheduling interviews back-to-back did not provide time to fix some of the problems before the next test. This cause the same problem to arise time and again. At first, I thought this was a good thing because the more people that struggled with an element, the more we understood it to be a significant problem. But if the problem was not part of the flow we set before them (ie. a broken link), it became more of a distraction.

Distractions are time-consuming – Whenever a distraction or unforeseen technical problem arose, we spent a good time of the interview talking about a thing that was not important to our research. Anticipating these distractions ahead of time will always prove beneficial to the time constraint for the test. Navigating the tester away from the irrelevant problem took finesse.

Unmoderated testing

While we loved moderated testing, we also needed a way of testing that did not consume all of our time. This brought us to unmoderated testing through an online service.

To be specific or not to be specific? – This was one of my biggest questions when writing the script that would lead a person down a path without a clear process. If I was testing a specific action, I needed to write the script in such a way that there was no question around how to get to that specific action. But once there, I wanted to allow the user to figure out how to conduct that action as they hoped. For example, I wanted to see how a user might try to add a block and then make it full-width, but I did not want them to waste time trying to figure out which block to add (some blocks do not allow full-width). So the script would read like this:

Keep in mind that the way to add various content to your page is by adding blocks or block patterns. Explore where you might go to add a block or pattern, and add the side-by-side image pattern to your page.

Make that pattern full width.

Ultimately it was a good balance that needed writing. The scripts would be read by many testers who had never used WordPress before, so getting them into the software and to the right screen required specifics. Once there, it was delightful to watch their recordings as they explored how to perform each task I provided.

Test the test – Because these scripts were being tested without any moderation, ensuring they worked as expected was very important. I would often test the test myself and with a fellow colleague, edit the script further, and make sure the site I was testing was free of any hassles.

Share everything – The service allowed us to share the videos recordings which was exactly what I wanted. Working in an open source community, it is important to share how decisions are being made and what evidence there is to support certain direction. I would post the recordings monthly on the WordPress.org blog and quote a few comments from the user below each one to provide some interesting findings.

Well, not everything – One important reminder for myself was not to share my take on each video too quickly. I noticed that if I wrote up my understanding and perspective, it may have unknowingly influenced other’s perspectives as well. I wanted to gather feedback about each video session before imparting my own thoughts. It was a way to get feedback on the feedback. Just the right amount of meta before going too far down the rabbit hole.

Bring it back – The posts were great, but rarely did everyone on the team have an opportunity to watch them. This required me to bring those findings back to the team. One way to do this was to focus an internal post about the findings and the immediate problems that surfaced. Another way was to jump in and create issues in GitHub. The latter was more effective because the issues were surfaced in the developer’s environment.

Talk doesn’t cook rice

One of my favorite sayings is “Talk doesn’t cook rice.” It means that just talking about something never actually makes the thing happen. This applied to testing as well. I could test all day, write all I want about the tests, read the feedback from the community, etc. but if I did not surface the problems to the team and ensure those problems had time, scope, and people working on them, then it was all moot.