WorldScribe: Real-Time Surrounding Descriptions for the Visually Impaired

October 11, 2024

Issue: Visual aids for the disabled
Author: University of Michigan
Published: 2024/10/11
Post type: Reports and minutes
Content: Summary – Introduction – Main article – Related topics

Synopsis: A world of color and texture could soon become more accessible to people who are blind or have low vision through new software called WorldScribe. The tool uses generative AI (GenAI) language models to interpret camera images and produce text and audio descriptions in real time to help users become aware of their surroundings more quickly.

Why it is important: This article introduces WorldScribe, an innovative tool developed by researchers at the University of Michigan that could dramatically improve the daily lives of people who are blind or have low vision. WorldScribe uses generative AI to provide real-time audio descriptions of surroundings captured by a camera, offering unprecedented access to visual information for people with visual impairments.[2]. The tool’s ability to adjust levels of detail, adapt to noisy environments, and respond to user queries demonstrates its potential to improve spatial awareness and independence for blind people. By providing immediate and complete descriptions of the environment, WorldScribe could reduce the mental effort required to understand the environment, allowing users to focus more on interacting with the world around them. Disabled world.

Introduction

A world of color and texture could soon become more accessible to people who are blind or have low vision through new software that narrates what a camera records. The tool, called WorldScribe, was designed by researchers at the University of Michigan and will be presented at the ACM Symposium on User Interface Software and Technology in Pittsburgh next week.

Main article

The tool uses generative AI (GenAI) language models to interpret camera images and produce text and audio descriptions in real time to help users become aware of their surroundings more quickly. You can adjust the level of detail based on user commands or the length of time an object remains in the camera frame, and the volume automatically adapts to noisy environments such as crowded rooms, busy streets, and loud music.

Continues below image.

A hand holds a smartphone towards a desk with a laptop on top of it. A television hangs on the wall above the desk and a bookshelf leans against the nearby wall. Text descriptions on the phone screen say workbenches, shelves, cabinets and TV. — When a user scans their phone’s camera around a room, WorldScribe will create short audio descriptions of the objects recorded by the camera. Illustration credit: Shen-Yun Lai, used with permission.

Continued…

The tool will be demonstrated at 6 pm EST on October 14, and a study of the tool, which organizers have identified as one of the best at the conference, will be presented at 3:15 pm EST on October 16.

“For us blind people, this could really revolutionize the way we work with the world in everyday life,” said Sam Rau, who was born blind and participated in the WorldScribe trial study.

“I have no concept of vision, but when I tried the tool, I got a real-world image and was excited by all the color and texture that I wouldn’t otherwise have access to,” Rau said. “As a blind person, we are filling in the picture of what is happening around us piece by piece, and it can take a lot of mental effort to create a bigger picture. But this tool can help us get the right information away, and in my opinion, It helps us focus on being human instead of figuring out what’s going on. I don’t know if I can put into words what a miracle this really is for us.”

During the test study, Rau put on a headset equipped with a smartphone and walked around the research laboratory. The phone’s camera wirelessly transferred the images to a server, which almost instantly generated text and audio descriptions of the objects in the camera frame: a laptop on a desk, a stack of papers, a television, and mounted paintings on the nearby wall.

The descriptions constantly changed to match what was in the camera’s view, prioritizing the objects closest to Rau. A brief glance at a desk produced a simple one-word description, but a longer inspection yielded information about the folders and papers arranged on top.

Continues below image.

When the user moves slowly around the room, WorldScribe will use GPT-4 to create colorful descriptions of objects. When asked for help finding a laptop, the tool will prioritize detailed descriptions of any laptops in the room. Illustration credit: Shen-Yun Lai, used with permission.

Continued…

The tool can adjust the level of detail in its descriptions by switching between three different AI language models. The YOLO World model quickly generates very simple descriptions of objects that appear briefly in the camera frame. GPT-4, the model behind ChatGPT, handles detailed descriptions of objects that remain in the frame for a longer period of time. Another model, Moondream, provides an intermediate level of detail.

“Many of the existing assistive technologies that leverage AI focus on specific tasks or require some kind of step-by-step interaction. For example, you take a photo and then you get some result,” said Anhong Guo, assistant professor of computer science. and engineering and corresponding author of the study.

“Providing rich, detailed descriptions for a live experience is a big challenge for accessibility tools,” Guo said. “We saw an opportunity to use increasingly capable AI models to create automated, adaptive descriptions in real time.”

Because it is based on GenAI, WorldScribe can also respond to user-provided tasks or queries, such as prioritizing descriptions of any objects that the user asked the tool to find. However, some study participants noticed that the tool had trouble detecting certain objects, such as a dropper bottle.

Rau says the tool is still a little clunky for everyday use in its current state, but says he would use it every day if it could be integrated into smart glasses or another wearable device.

The investigation

The researchers have applied for patent protection with the help of UM Innovation Partnerships and are seeking partners to help refine the technology and bring it to market.

The research was funded by UM.

Guo is also an assistant professor of information at the UM School of Information.

Study: WorldScribe: Towards Live Context-Aware Visual Descriptions

Attribution/Source(s):

This quality-reviewed publication was selected for publication by the editors of Disabled World due to its important relevance to the disability community. Originally written by University of Michiganand published on 10/11/2024, content may have been edited for style, clarity, or brevity. For more details or clarifications, University of Michigan He can be contacted at umich.edu. NOTE: Disabled World does not provide any warranty or endorsement related to this item.

Page information, citations and disclaimer

Disabled World is a comprehensive online resource providing information and news related to disabilities, assistive technologies, and accessibility issues. Founded in 2004, our website covers a wide range of topics, including disability rights, healthcare, education, employment and independent living, with the goal of supporting the disability community and their families.

Cite this page (APA): University of Michigan. (2024, October 11). WorldScribe: Real-time environment descriptions for the visually impaired. Disabled world. Retrieved October 11, 2024 from www.disabled-world.com/assistivedevices/visual/worldscribe.php

Permanent link: WorldScribe: Real-time environment descriptions for the visually impaired: A world of color and texture could soon appear. become more accessible to people who are blind or have low vision through new software called WorldScribe.

Disabled World provides general information only. The materials presented are never intended to be a substitute for qualified medical care. Any third party offers or advertisements do not constitute an endorsement.

WorldScribe: Real-Time Surrounding Descriptions for the Visually Impaired

Introduction

Main article

The investigation

Lately: October • Kath Eats

Let's Settle It: Hoka Or Brooks? What Podiatrists & A Marathon Runner Say

Mother’s Acetaminophen Use During Pregnancy Increases Child ADHD Risk

Couple With Cerebral Palsy Celebrate 33 Years Together

Changing Mindsets about Hearing Loss — Blog

United Airlines Failure Traps Wheelchair Rugby Team in Air Travel Hell ⁣⁣

Leave a reply Cancel reply

Compare items