|
Google Translate
This project showcases my skills as a hands-on team lead for design and research (I did all the design & research) and my ability to collaborate closely with engineering (through acquisition of the WordLens team) to develop best-in-class product solutions. |
DESIGN CHALLENGE
Create an instant, magical experience of holding your camera up to writing in a foreign language and have it immediately appear in your language. Seamlessly enable improving the translation without having to explain that the image needs to be sent to the server for sentence translation (vs. the lower quality immediate word-by-word translation they just saw).
Create an instant, magical experience of holding your camera up to writing in a foreign language and have it immediately appear in your language. Seamlessly enable improving the translation without having to explain that the image needs to be sent to the server for sentence translation (vs. the lower quality immediate word-by-word translation they just saw).
MY ROLE
I spent two years leading UX design and research for Google’s highest rated app. During that time I was responsible for the UX of major releases for speech, camera, and handwriting. When I started, the vast majority of traffic was from users typing text into their phone for single words, pasting things into Google Translate on the desktop, or translating entire pages using Google Translate's Chrome browser plug-in.
In 2014, Google acquired WordLens the app that let you instantly translate images into text on the phone word-by-word. Google Translate already had superior server-side capabilities to translate images with text. The question was how to enable the magic and tap into the server-side quality. This required close collaboration between engineering and UX.
I had been an engineer for 10 years right after getting a Computer Science & Spanish (double major) from Dartmouth College. Rapid prototyping, user testing, brainstorming UX and technical solutions is something I do in my sleep. This collaboration resulted in a successful integration of the WordLens product and a set of innovations and insights which we patented to help drive the roadmap for the next several years.
ACCOMPLISHMENTS
To understand how this works, check out the video the team put together here:
I spent two years leading UX design and research for Google’s highest rated app. During that time I was responsible for the UX of major releases for speech, camera, and handwriting. When I started, the vast majority of traffic was from users typing text into their phone for single words, pasting things into Google Translate on the desktop, or translating entire pages using Google Translate's Chrome browser plug-in.
In 2014, Google acquired WordLens the app that let you instantly translate images into text on the phone word-by-word. Google Translate already had superior server-side capabilities to translate images with text. The question was how to enable the magic and tap into the server-side quality. This required close collaboration between engineering and UX.
I had been an engineer for 10 years right after getting a Computer Science & Spanish (double major) from Dartmouth College. Rapid prototyping, user testing, brainstorming UX and technical solutions is something I do in my sleep. This collaboration resulted in a successful integration of the WordLens product and a set of innovations and insights which we patented to help drive the roadmap for the next several years.
ACCOMPLISHMENTS
- Launched instant camera translation making it possible to hold up your phone and instantly translate signs, menus, text and seamlessly select and translate specific sections of an image more accurately.
- Lead patent filing for 15+ patents combining technical and UX innovations. See my LinkedIn profile for a list of patents.
- Contributed to launches of Google Translate for Google Glass, Google Now, and other Google products.
- See Google Translate page for speech for more details.
To understand how this works, check out the video the team put together here:
Details Behind The Design
You've seen the magic with big street signs. This is what I call instant wow. It turns out that this can also be called a "one hit wonder" since it's great for demos but represents a small portion of what actually happens in the real world (see a similar experience with magic filters at PicsArt).
The problem is that most of the time users are trying to read small text on menus, bottles in stores, labels in pharmacies, and cursive letters in magazines...in poor light with warping due to the surfaces. Instant translation does the translation word-by-word. The typical result is something like what you see on the French magazine in the first mock below or in the parodies of songs run through Google Translate. (You know you have a product that is part of our popular culture when this happens:)
Camera languages are the reverse of speech & text: The other issue we found is that for both text and speech, people typically translate from the language they speak into the language they are translating. But for camera translation it's the reverse.
You've seen the magic with big street signs. This is what I call instant wow. It turns out that this can also be called a "one hit wonder" since it's great for demos but represents a small portion of what actually happens in the real world (see a similar experience with magic filters at PicsArt).
The problem is that most of the time users are trying to read small text on menus, bottles in stores, labels in pharmacies, and cursive letters in magazines...in poor light with warping due to the surfaces. Instant translation does the translation word-by-word. The typical result is something like what you see on the French magazine in the first mock below or in the parodies of songs run through Google Translate. (You know you have a product that is part of our popular culture when this happens:)
Camera languages are the reverse of speech & text: The other issue we found is that for both text and speech, people typically translate from the language they speak into the language they are translating. But for camera translation it's the reverse.
So when users first open camera the language settings are backwards (eg, English to French when they are trying to read French text). To fix this we made the camera translation bar visually prominent (green), show a tooltip if there is nothing being recognized, and enable swapping languages by tapping on the green bar. This generally worked in user testing.
Cloud-based OCR & translation: The best translation happens when the image gets sent to the cloud. Google's cloud-based servers have the higher OCR (optical character recognition) for parsing the text and the entire sentence gets translated leveraging the contextual meanings of the sentence as opposed to word-by-word translation. The big red camera button is what triggers this action. Just getting users to tap this was challenging because they were focused on the highly visual image on the canvas.
Cloud-based OCR & translation: The best translation happens when the image gets sent to the cloud. Google's cloud-based servers have the higher OCR (optical character recognition) for parsing the text and the entire sentence gets translated leveraging the contextual meanings of the sentence as opposed to word-by-word translation. The big red camera button is what triggers this action. Just getting users to tap this was challenging because they were focused on the highly visual image on the canvas.
To succeed, they need to take a picture that is in focus. However, the instant camera translation text can be flashing different (sometimes hilarious) words onto the image making it hard to focus on the specific text (see first image below left).
Instant OFF: Ultimately user feedback was so strong that we enabled turning off instant camera translation so that higher quality photos could be taken for cloud-based translation. We introduced green indicators when instant translation was ON (first mock below): the Instant ON text and the green eye icon. Tapping on the green eye, turns instant off and displays a similar toast that says Instant OFF but in gray (not shown).
Masking server delays: The time required to send an image to the server, parse it, and return an overlay with the recognized character blocks can be several seconds or longer depending on the connection speed. If you have the wrong order for source and target (which most people do initially), then you'll be waiting for the system to come back with no results. We tested a lot of different solutions and the only one that really worked mapped onto users mental model of a desktop scanner that goes back and forth over an image. So we introduced an animation using a blue bar that animates back and forth across the screen while scanning for text. The text indicates what language the system is looking for helping users to recognize if the language order is incorrect.
Instant OFF: Ultimately user feedback was so strong that we enabled turning off instant camera translation so that higher quality photos could be taken for cloud-based translation. We introduced green indicators when instant translation was ON (first mock below): the Instant ON text and the green eye icon. Tapping on the green eye, turns instant off and displays a similar toast that says Instant OFF but in gray (not shown).
Masking server delays: The time required to send an image to the server, parse it, and return an overlay with the recognized character blocks can be several seconds or longer depending on the connection speed. If you have the wrong order for source and target (which most people do initially), then you'll be waiting for the system to come back with no results. We tested a lot of different solutions and the only one that really worked mapped onto users mental model of a desktop scanner that goes back and forth over an image. So we introduced an animation using a blue bar that animates back and forth across the screen while scanning for text. The text indicates what language the system is looking for helping users to recognize if the language order is incorrect.
Primary Use Cases: Typically for high density text in menus and labels, translating the entire block of text is not necessarily helpful since it's hard to map the source text to the target. Users typically want to select one line or one block of text for translation: a line in a menu or a block of text on a label or magazine or book.
Indicating Recognized Text: As a first step, we tried several approaches to indicating which elements on the page were recognized. The bounding box for each word block performed the best since the system recognized phrases not just individual words.
Selecting Text For Translation: Then the challenge was to get users to select the text they wanted to translate which was not a typical action for the majority of users. We ran repeated user studies to determine that the text "Use your finger to highlight text" was something the majority of users understood. People used a variety of gestures to indicate the text to select including drawing containers, smudging to fill in areas, and drawing lines. The selected text was instantly translated and users could highlight sections of text to correct the initial selection. This immediate feedback and interaction facilitated discoverability and increased completion rates.
Advanced Features: Optical character recognition and instant text translation became a high intensity and high retention feature for the people who used it. Through a deep understanding of user behavior, the functionality was extended to include the ability to 1) select and translate images from your gallery (the picture icon in the bottom left of the first mock), 2) select and translate all the text, and 3) turn the flash on permanently to highlight images for translation.
Indicating Recognized Text: As a first step, we tried several approaches to indicating which elements on the page were recognized. The bounding box for each word block performed the best since the system recognized phrases not just individual words.
Selecting Text For Translation: Then the challenge was to get users to select the text they wanted to translate which was not a typical action for the majority of users. We ran repeated user studies to determine that the text "Use your finger to highlight text" was something the majority of users understood. People used a variety of gestures to indicate the text to select including drawing containers, smudging to fill in areas, and drawing lines. The selected text was instantly translated and users could highlight sections of text to correct the initial selection. This immediate feedback and interaction facilitated discoverability and increased completion rates.
Advanced Features: Optical character recognition and instant text translation became a high intensity and high retention feature for the people who used it. Through a deep understanding of user behavior, the functionality was extended to include the ability to 1) select and translate images from your gallery (the picture icon in the bottom left of the first mock), 2) select and translate all the text, and 3) turn the flash on permanently to highlight images for translation.