The Talking(Speech technology enabled) Form Services for Blind, Visually Impaired or Agraphic Persons of Nepal
We will allow visually disabled people to access Government Information and Services via audio contents and fill forms using their voices.
THE TALKING FORM POSTER containing, the kiosk, special input device and companion app.
Page 1 of Story
Page 2 of Story
Cover Page of The Talking Form Services
What problem does your idea solve?
Visually disabled people are left to live in stigma in Nepal where they are dependent to access government information and services. There is no government support so far to even file a request form for services like citizenship certificate. Privacy is a serious issue, as they have to rely on a mediator to answer even private form-questions. Our idea will solve this problem by giving access to the information and form contents in audio and allowing persons to orally fill and file the forms.
Explain your idea
Government offices of Nepal are yet to support blind or visually impaired citizens. Most of information and request forms for the government services are paper based and have no braille support. There is a growing trend to digitize these forms but still lack accessibility. Filling these forms is extremely time consuming if one relies on accessibility support in the computers; however most of the blinds have no computer skills. In both cases, a disabled person has to rely on someone to get information and fill the forms.
Our idea is to solve this problem by building a enclosed kiosk which provides information in a richly formatted (easy to navigate) audio contents and allows users to use voice to fill a form. The kiosk will contain a headphone, a microphone, a special remote controller with a biometric sensor, a printer, an Automatic Speech Recognition (ASR) software and a touch screen (optional for normal users). We will also make a companion mobile-app which has the same functionality and can be connected to a printer.
When audio inputs are given to the forms, the ASR software automatically converts the voice into text. Screen readers in the form will help to review the inputs. The user can then confirm the submission by providing a finger print. The system will respond with a confirmation code in audio and braille printouts on a card with a barcarole. The card or the confirmation code can be used to check the status later on the same kiosk or the mobile-app.
User Experience Map : THE IDEA.
We will install the tactile pavements in all the government offices which will guide blind and visually impaired users to The Talking Form Kiosk.
The kiosks will help the blind users to fill in application forms without getting help from any one and in their own privacy.
User Experience Map: Information and Find the Kiosk. The differently abled service seekers will get information about the service from the reception. But we also plan to use mass media like FM radio, TVs and Internet to give information about the Talking form service. Also we will use the network of the Association of Blind Nepal to give information on the service on personal levels.
User Experience Map: The Greeting and the remote controller.
The kiosk will greet the users as soon as they approach. This will be just a repeating audio playing every minute when there is no one in the kiosk. When there is someone in the kiosk it will ask the users to wait until the earlier person completes her session. Please note that the screen will be turned off by default, user can turn it on manually. There will be a manual button to turn on the screen.
User Experience Map: Getting acquainted and the welcome page. We will develop a small exercise, which experienced user can skip, to make users acquainted with the system. In the User Experience Map, we don't talk about the exercise. We want to make the kiosk usable for people with no literacy and computer experience. Hence we will make the exercise suitable for the user will no literacy.
User Experience Map: Question and Audio Input. Speech Recognition is a most important feature here which needs to be trained for Nepali. We are looking forward to use Sphinx CMU for Nepali ASR. Luckily Nepali is a language which is written the way it is spoken which means there will not be spelling and pronunciation difference. However for the purpose of demonstrating our idea, we have put the spell-it feature for English speakers.
User Experience Map: Confirmation. Confirming the audio recording using digital finger prints is a very new approach. We are not sure if it is legally recognized in Nepal yet. Our challenge will be to get it approved. But the best way to get it approved is to make a system that works perfectly. This is our goal!
This prototype was built in the second iteration to get user feedbacks. We built a very basic HTML with reactjs with a very simple introduction, sample question and a feedback as a prototype. We tested it with blind users from Blind Association of Nepal. This is a video is just a screen capture of the prototype.
We testing the prototype to find the very first feedback from the real user on the interaction during oral question answer session. We wanted to see how users reacted to the system.
In 2011 Nepal Census, 513,321 individuals reported having “some kind of disability,” of which almost 100,000 were blind. There has been no fresher census in the country but with an annual 1.24% growth in population, we can expect at least 107674 people to have partial or complete blindness by 2017. If we implement the Audio Form Services in all the federal, state and local levels, all of them will directly benefit of which the youth will mostly benefit as they will seek services more.
Nepal Association of Blind is the Apex Association of all the blinds in Nepal. They represent more than 100000 blinds from 75 districts of the country. They believe that a lot of blinds from rural areas are yet to be members of the association due to accessibility issues.
Nepal Association of Blind has been working in Nepal since 1993. Since then they have established networks in all the 75 districts of the country.
She is a visually impaired girl, a member of Lalitpur Association of Blind. We took consent from her to use her picture.
One of the blind users testing our prototype and giving feedback on his laptop. We prepared a prototype using basic HTML and React Js. We took consent from our test user to use his picture.
Getting Feedback with a young blind who had to leave his hometown for Kathmandu because there is no school for blind in his town. We took consent from him to use his picture.
How is your idea unique?
We have no information on similar technologies being used elsewhere and the nearest competitor would be a company that produces audio books and contents. There are also applications like SIRI in iPhones however they are not available in Nepali. In 2007, I had been working in Access to Computers for Non-Literate People project (ACNLP) which was related to audio enabled User Interfaces and audio content visualization of speech into sentences and paragraphs in Nepali. The proposed speech technology enabled forms will, however, be quite different than ACNLP as it will be dealing with a formal government form and audio inputs. Current speech API support in most of the modern browsers will also make the development faster and implementation straight forward. Speech API or commercial Speech technology is not yet available in the Nepali; so we believe, that after the success of this project, a new paradigm to digital contents and digital forms will globally open.
Tell us more about you
We are an IT research and development company comprising of individuals with a lot of experience in software and mobile app development, Interaction Design and Speech processing. We work a lot with government, universities and tourism sector. Apart from commercial projects, we also do a lot of community-based projects. We had recently been awarded a grant by the World Bank to work on urban poverty alleviation of the Patan City using IT and local Tourism at UNESCO World Heritage sites of Patan.
We are an IT Consulting firm established in 2010, where we work a lot with Government Projects, Management Information Systems, Web Portals etc. We also work a lot with child education sector, universities or Non Profits working in education. We sometimes work in product development projects with research institutes in Scandinavia and Far east. Recently we have been awarded a grant by World Bank to design a product which can alleviate urban poverty in the city of Lalitpur (Patan).
We are active contributors to the Natural Language Processing and Speech Database community. In 2014 We organized CICLING Conference (CICLING.org) along with Centre For Communication and Development Studies in Kathmandu. This picture was from a panel discussion on NLP in endangered languages of the world.
We designed and sold tshirts related to the big earthquake in 2014 in order to collect cash to support victims of the earthquake. We distributed clothing to the villagers of Battar, Nuwakot. We are also supporting 2 girls aged 7 and 13 for their education as a part of CSR from our company. Both of these girls' houses were destroyed during the tremors.
This is picture of community engagement program for data collection on tangible and intangible heritages of Patan City, as a part of Patan Heritage Walk Mobile App Project.
We had recently been awarded a grant by the World Bank to work on urban poverty alleviation of the Patan City using IT and local Tourism at UNESCO World Heritage sites of Patan.
What are some of your unanswered questions about the idea?
We feel that we will need to test out Automatic Speech Recognition(ASR) in Nepali Language using Sphinx4 library which is the most promising library for ASR. Though we had already done some experiments in Sphinx3 with Nepali a few years ago, there are a lot of modifications in Sphinx4 and we will need to vigorously train and test the ASR system before we can use the speech technology enabled forms to be used by the public. Also, though we work a lot with the Government of Nepal, we will still need to convince a lot of officials before we can deploy the system to the government offices. The system may also be applied to the private sector, however, we are unsure on who will pay the costs in that case.
Where will your idea be implemented?
Experience in Implementation Country(ies)
Yes, for more than one year.
Expertise in Sector
I've worked in a sector related to my idea for more than a year.
We are a registered for-profit company (including social enterprises).
Prototyping: I have done some small tests with prospective users to continue developing my idea.
How has your idea changed based on feedback?
After the feedback from OpenIDEO, we decided to follow the WCAG2.0 guidelines as far as possible. However, we want to focus on usability of the system for users even for those who are new to the digital machines. Hence, we will not use TTS as they don't sound natural and format audio contents in natural speech.
After the user feedbacks, we found that one of the biggest concerns the users have was related to privacy when users answer to the machine. They said that they were embarrassed to share their personal details in front of a third person even if he/she is a friend. We have thus decided to enclose the kiosk in a sound proof booth so that the user can feel free to answer the personal questions.
We realized that making the users awareness of technology is more important than making a technology. Hence we have decided to promote and advocate about the solution in any ways possible. Inside the offices, we have decided to use Tactile pavements to give directions to the kiosks.
After hearing about the privacy issues and how embarrassed the users felt when getting help from a mediator to fill in forms with very very personal information to get services from the authorities, we were very touched. We made a design decision on enclosing the kiosks in a sound proof chamber. Initially, we only thought about providing a headphone along with the kiosk which will be located in an open space.
A design of remote controller with a biometric finger print sensor. This design is also a result of the privacy related stories shared by the users. The audio contents of the filled forms can now be confirmed or submitted using a finger print. Now the users don't have to share the contents of the form to any one and they can enjoy their freedom and independence to the fullest.
Initially we had assumed that the blind and visually impaired users would just find the kiosk in the Citizenship Office. However the feedback sessions with the real blind users changed the idea dramatically.
When interacting with the blind users at the Association of Blind Nepal our hearts were touched to hear that it took hours for them to reach a cabin inside an office building because they couldn't read the signs. This gave us an idea to use tactile pavements to take them directly to our kiosk. We need to create a tactile pavement to give a sign that they reached the kiosk or the reception or a toilet.
Who will implement this idea?
Sagun Dhakhwa will be responsible for managing team, designing and creating the solution. Tina Rajbhandari (Resource Manager) will be responsible for outreach, management of resources and logistics.
There will have 5 supporting staffs : 1 Speech Technology Expert, 2 Software, 1 hardware, 1 UX/ Sound Engineer. The team will be located in the Kathmandu Valley and will travel when needed.
The blind association of Nepal will be responsible for advocacy and lobbying with the government.
Using a human-centered design approach, you may uncover insights that lead to small or foundational changes to your organization’s existing strategy or processes in order to unlock the potential of your idea. How would your organization go about making such changes?
In our organization, we focus mainly on the usability of the products and found that human-centered design improves usability. We have seen the potential of human-centered design approach after using it a bit for this challenge. Using the prototypes and User Experience Map, we could unveil many features and design decisions which we wouldn't have found else.
Our organization has already decided to adopt HCD in future projects. To implement it, we are going to organize formal trainings for the design team on the new design process in coming months. We have a vibrant design team where people come from multimedia, fine arts, IT and graphic design background. The training will not just be about the process but the benefits and potentials it has. It will be also bring them on board to HCD
What is it that most attracted you to Amplify instead of a more traditional funding model?
We were attracted to Amplify because it gives us is a chance to make a difference while learning new methods, tools and ideas. It also gives us an opportunity to learn from other's solutions and the chance to collaborate with them. It helped us to test, get feedbacks, improve and grow and above all test our potentials. The experience of the whole challenge was engaging, rewarding and fun, that is why we are continuing to participate in the challenge.
What challenges do your end-users face? (1) What is the biggest challenge that your end-users face on a day-to-day, individual level? (2) What is the biggest systems-level challenge that affects your end-users?
On a day-to-day level, visually disabled users have to get help from others for any kind of legal processes. Whether it is a citizenship application, bank transaction or property transfer, legally, they have to be accompanied by a normal person in order to do a transaction. There is no privacy and sense of independence for them.
The biggest and critical system-level challenge will be the Nepali Automatic Speech Recognition (ASR). There is no Nepali ASR available and we are trying to customize Sphinx CMU, an open source ASR in Java, for Nepali. A Hindi language model is already available for Sphinx and since Nepali is very close to Hindi, we are confident that we will be able to create a language model for Nepali. ASR will be the most critical part that will affect the end-users.
Tell us about your vision for this project: (1) share one sentence about the impact you would like to see from this project in five years and (2) what is the biggest question you need to answer to get there?
IMPACT: By 2022, we aim to implement Talking Form Application for Citizenship Certificate Application in all the 75 Citizenship Offices in Nepal, which will benefit more than 120000 blind and visually impaired persons in the country.
QUESTION: How do we convince and lobby with the government, political parties, civil societies and human right organizations to implement Talking Form for Citizenship Certificate Application and How do we legalize audio inputs as equivalent to inputs in writing?
How long have you and your colleagues been working on this idea together?
How many of your team’s paid, full-time staff are currently based in the location where the beneficiaries of your proposed idea live?
Under 5 paid, full-time staff
Is your organization registered in the country you intend to implement your idea in?
We are registered in all countries where we plan to implement.
My organization's operational budget for 2016 was:
Between $50,000 and $100,000 USD
If your team/idea/organization has a website, please share the URL below.