Linux accessibility, a new screen reader, probably

Hello there, community!
so, I want to make a new screenreader for linux and I want to gage your opinions as well. Note: I think this should be in the open already, DK why I didn’t mention it, but I’m bgt lover on the audiogames.net forum.
For who wants to cary this on the audiogames.net forum for some reason, here’s the topic link
linux accessibility: a new beginning, screen reader proposal (Page 1) — Developers room — AudioGames.net Forum
With that out of the way, I would replicate the post I did in there, as not to write basically the same thing but possibly with other words.

For a long time now, I envisioned linux and the freedom it provides as the next step in the privacy and device telemetry battle, as a kinda insurance that my data will never leave my computer without my permission, without me being able to see inside the machinery of the package manager for example and with full knowing of my actions, disable the microsoft repos from the apt repository list. However, things are far better than that, basically linux is full freedom.
With windows, in order to personalise your pc a bit, you have to activate it. With linux, it’s damn easy to do, just go to the settings app for your desktop of choice, then change some params, or even better, just do good things to the configuration files.
Want to completely change your desktop, a.k.a shell? On windows, meh, you’re kinda locked in whatever ms thinks is the best, the only way to change it being to go in the registry and do nasty hacks. Granted, an automated tool does it, but everything in the thing is hacks around windows limitations and restrictions. On linux however, most of the time is install a couple tools, tweek some settings, then install a package from the distribution’s repo. On boot, you will be presented with a choice of desktops, being able to pick any of the installed desktops at any time. Plus, it’s accessible, so that’s that.
Let’s say you want to change how your login prompt is, again, on windows you’re stuck, while on linux everything is replaceable if you know what you’re doing. As such, don’t like lightdm, OK, pick another display/login manager, like gdm.
To continue my previous point, remember cortana, that bloated cpu hungry thing that gets installed everywhere, even though it’s supported only by afew regions? Yeah, that thing. Well, you probably are astonished to find out, but it can’t be uninstalled, plus fucking starts at logon as far as I have been able to observe. It can be disabled, but again, hacks around limitations, so yeah.
On ubuntu, let’s take one of the included applications, snap. If you want to remove it, canonical isn’t stopping you, neither could it. You just remove it from your system in any way you see fit.
Now, to bring this to a close, linux is free of charge, of restrictions and of pattents, being open source and all that, plus it can be made to run on anything, even on your grandmother’s potato with 1.50001 gb ram and a one core cpu or something, it’s totally possible, Not the fully fledged thing, but doable. What do you think our routers run nowadays?
Now, unfortunately, I have to point out what linux is not.
Linux is not driven by corporates with interests bigger than any supercomputer in existance, it’s not full of bugs such as a simple software update could make your computer lose sound or become unbootable, it’s not driving developers away or making special languages/ecosystems to keep them on that platform for as much as possible, it’s not closed source, however what’s more important, it’s not accessible.
Now, the linux users on stage, anyone? very good! They can and totally will point out that: “linux is accessible, I have used it extensively, I made an accessible ripoff of a linux distro, it’s totally doable!”
While I agree with the sentiment, as I’m an ubuntu user my self for quite some time now, linux is not accessible, not to the level windows is, we just have to admit it. Sure, we learned to deal with the quorks and issues because we incountered them and just sort of figured it out, but as many of you could agree, it is quite an involved process sometimes, with minimum some programming knowledge required to deal with some horible nasty situations, though those are kinda hard to incounter, so yeah. Point is, linux is not as accessible as we might want it to be, or as we think it is.
With full knowledge that the linux ecosystem, the kernel, the drivers, the packages and even the desktops themselfs are for the most part community maintained and free, I can’t in good faith demand or argue for accessibility, as I can, for example, with companies like ms or console companies, like nintendo, soni, etc. I know the linux communities aren’t getting paid a peny for their hard work, updating the software to comply with new hardware, the design of the gui to comply with new end user expectation. I know all of that is hard, as well as I know that, being a small minority of the user base, accessibility is very low, if at all, on their priority lists.
However, in stead of complaining, I am trying to solve it partway, then probably they will be more willing to collaborate with us when they see we take the initiative.
I think most of you agree with me here when I say, orca is a terrible screen reader! OK, probably not, it’s what we’re using most of the time, well, let me elaborate then.
First, it lacks some of the features the screen readers of 2021 should have. Well, when we navigate weird apps, even the damn windows settings app, we need object nav, a way to explore the screen in parts of the accessibility tree a keyboard can’t reach, for example blocks of text with important information which, as we know, aren’t focusable by default. Well, guess what, orca has no such thing, not like nvda or voiceover. Before you start, don’t even get me started on flat review, one lost all the context in that flattened form, it’s only text then, no clear boundaries. And I’ve seen bits of the accessibility API, so I know what I’m saying when I say it’s possible with what we have already.
Next, no ability to move the mouse with the keyboard. Remember things like steam, where a controll can only be activated by clicking with the mouse? Well, one could just drag the mouse cursor to the current navigator object and do a double click, you guys know this from nvda, so yeah. Again, I’ve seen enough of the accessibility framework to see it’s possible, but again, not implemented in orca.
No ocr support. In the age of 2020+, we should be able to have everything accessible already, world advancements and all that. Well, unfortunately it isn’t, and we sometimes can’t rely on any of what I said above, as accessibility is simply not present. In that case, yup, we use ocr to the best of our ability. There are examples where ocr works quite well, such as in the game hades. In that way, we can still play the game.
Then, no add-on support, the orca simple pluggin thing is not included, as one must install it as an external thing.
As an extra thing, the upper layers of orca are full of specific user hacks to try and patch the various accessibility gaps that opened in the hul of linux for some time now. Without competent enough developers, those hacks might or might not work, depending on the moon, the current value of bitcoin, if a photon zapped by your ear that moment or something else. Therefore, it’s also unstable and prone to freeze your computer at random, as some of you native ubuntu users can confirm. Also, because of this, orca kinda became spageti code, so much so that I can’t try learn atspi reading its code, I’m afraid it’s too bad design to understand it which proves to be the case most of the time.
Finally, it’s written in python. This shouldn’t be very horible on its own, and it isn’t, after all machine learning stuff is running on the python interpreter. However, python apps are generally more bulky, more memory consuming and cpu hungry than their native counterparts so to speak, so in a place where speed and small memory footprints are important like in the screen reader realm, apps should be written in a way as close as possible to the hardware, for example in C, C++, etc. Unfortunately, both nvda and orca are suffering this problem to some extent, I think I don’t need to elaborate very much on this.
I’m sure there are other problems I don’t remember off the top of my head, however those should be enough to paint the picture.
So, my solution is to create a brand new screen reader for the linux family of operating systems.
first, it will have all the features of orca, even if I have to translate some code structures from there, as I DK where could I learn how atspi works, even though that can also be difficult because of the spageti code orca is.
Then, it’ll have object nav, more mouse options, like the navigator object to be followed by the mouse cursor, zoom and highlight effect around the navigator object and mouse cursor, etc, as well as ocr and probably other things nvda has. Of course, it’ll have add-ons as well, but that’s damn far in the future.
Thing is, I don’t want to start a project knowing I could probably be the only one using it, especially if its a screen reader, one of those kinds of projects one works on in a lifetime. So, if I were to make such a thing, would anyone use it, would blind windows users consider switching to linux if the screen reader proves to be as competent as windows, opening the way to greatter linux accessibility by the virtue of example?
Now, about the more technically inclined, because of what I said above about performance and all that, I’ve seen it happen on windows, with the zdsr screenreader. So, I downloaded the community wellfare edition, don’t exactly remember where I got it from, however it gave a big performance boost. For example, tabbing through a dialog or window happened instantly, even if I tabbed I think twice or so per second, no delay at all, same for arrowing through lists of long items. It also used a tiny amount of memory and processing power, about a third of nvda, if not half. DK if nvda is not optimised enough, though it might be the case, however the one or so ms delay ctypes offeres adds up, especially in situations when the computer is overloaded with a program, leaving little memory for nvda to work in, it can crash the entire computer, saw it happen.
So, with that said, I’m gonna make it in rust. Since this is a new beginning, I want to break from that monotone of only python is the best for screen readers, and because these people used it, I will too. We must realise that, without lots of hacks and stuff, python performance is not great, not for this. I’d say vanilla python is not very well suited for things like screen readers, the same way java is not suited to programming microcontrollers.
One more thing: Whoever gets their hope level too high about this should not, not so fast anyways. You must know that making a screen reader is a very involved process, the most pityfull prototypes taking up to a month or more to be released to the public, a beta release could take up to a year or more, the whole screen reader could still be in development for a damn long time, look at nvda to get a simple picture. Unlike some of the few people on this plannet, I am not an accessibility god, neither do I yearn for such a title anyway, however I will do everything in my power to bring this dream to pass somehow.
Maybe I won’t be able to do it because of insufficient rust knowledge, the undocumented nature of atspi, or any number of other factors. For the first time in my programming career, I will start on the pesimistic side of the road, as I never did something like this and it might fail for a miriad of reasons, this is frankly the most complicated programming project I ever did. Maybe it’ll fail indeed, however what I won’t do is fall over dead before I could fight with all I’ve got, I must at least try, if nothing else. Plus, even if it fails, the experience I gain would be worth it. As I said multiple times, stop complaining, start doing what you can to improve it. Well, in other threads, I gave examples of people in this community who are actively trying to change what they can about the world we live in, to make it better, either with 3d audio, accessibility of firmware settings in the preboot environment, designing game accessibility guidelines and frameworks to be used in a game to exemplify that specification, now it’s my turn to at least try to change what I can about linux being everything but not accessible enough.
So, what do you guys say? Would you use it? would it be possible to make? any other comments, throw them in this vortex of a topic. I already see it spreading.

2 Likes

I would definitely use it! Thanks so much for even beginning to try this!

I did actually contemplate writing a screenreader for Linux in Rust. I got as far as getting some bindings to write but got sidetracked. With respect to implementation, I considered using Tokio to leverage multithreading and maybe writing a basic synth dispatcher in lieu of speech-dispatcher.

for speech, I think using speech dispatcher is good, as that’s the best we’ve got in linux, plus all voice modules are written to communicate with speech dispatcher, and i don’t think anyone would want to rewrite their voices to comply with my new specifications and such. Plus, speech dispatcher is surely not the reason for orca to lag as it does, behave as it does now, etc. If I’ll find that speech dispatcher is the reason, I can just replace it with something like espeak, problem solved.
about bindings, I think there’s the tts crate, maybe I can use the speech dispatcher bindings from there.
I DK what that crate is, however for multythreading, isn’t the standard library enough? Ya know, mpsc channels, mutexes, atomic reference counter, etc.

Honestly not sure; I stumbled on it when I was researching.

Although it would probably be best to coordinate a project if we are going to do this for real. I started one at here.

So for whatever reason, dbus-codegen-rust is panicking when I go to generate the atspi bindings. This is for version 2.38 as provided by Arch, though master was able to generate the bindings fine. Interestingly there was a proprecessor script that needed running, but other than that I am a tad stuck.

hello there!
so, due to me totally losing the credentials for this forum, the email address included, I had to make another account. Good side is, I have the same name as on the other forum, so I guess it’s good.
anyways, @hjozwiak, I since learned what tokio and async programming is, I gained another dev to help me in this endever, we did one prototype release with the C libatspi bindings, we also have the same prototype in dbus, fully async, using the tokio executor, no longer restricted by glibs own, more freedom this way. Plus, the advantage of tokio, work stealing, etc would be very much appreciated in the future, so we’re sticking to that.
in any case, if anyone wants to see what we’ve been up to, here’s the github org

and here is the website, not that you couldn’t find it at the top of the page, but you know, for convenience
https://yggdrasil-sr.github.io/
if someone wants to join in as well, either open an issue, start a discussion, or start opening pull requests right away if you feel you can contribute to something, they will be reviewed and merged if applicable.