This was certainly fun to write. I tried to keep it as accessible as possible for non-programmers, but there’s a section at the bottom for those interested in the exact details.
A brief refresher: One operation absolutely essential for QTTabBar to function is hit testing, which means “figuring out what file or folder the mouse is over”. This used to be super easy, but with the advent of the new ItemsView control, it’s no longer trivial. Fortunately, it’s still possible through the use of the UI Automation library. Unfortunately, the ItemsView has a nasty bug in it: every time you make a UI Automation query, something builds up inside the control that introduces a tiny bit of lag, permanently. That means every time I ask what file the mouse is over (which I need to do very often) the lag grows a tiny bit. After many queries, the lag becomes very noticeable, and after a while it becomes excruciating. The lag goes away when Explorer is restarted, so users that restart their computers often probably barely notice this problem at all. But for power users who infrequently restart, or laptop users who always hibernate, it’s a major issue.
My quest to find a workaround would have been completely impossible where it not for Microsoft’s Public Symbol Files. Microsoft supplies debugging files for all important executable files in Windows. These files don’t give you the source code, of course, but they do give you the names and locations of all the internal functions. This was enough for me to attach a debugger to Explorer and (after a lot of digging through assembly code) figure out exactly the bug that was causing the problem.
Every time a UI Automation query is issued, the connection to the ItemsView’s automation object is created anew. Whenever it’s created, the UI Automation library registers for certain event notifications from the ItemsView, such as “notify me when the selection changes,” and other such events. It’s during this event registration process that the lag increases. It seems a little wasteful to create and destroy the connection every time a single hit test query is executed, and indeed, my first idea was to somehow cache it, so that it only needed to be created once. But over the course of trying to do that (which turned out to be futile anyway), I noticed something very critical: before registering the events, the ItemsView automation object is first asked if it is the type of object for which events can be registered. If the object were to reply, “no, I’m not that kind of object,” then the code introducing the lag is skipped right over! Figuring out how to intercept that request and respond with a fake reply was tricky, but once I did, the lag vanished.
Of course, this does introduce another problem: if any UI Automation application was actually counting on those event notifications, then that application wouldn’t get them once my hook was in place. QTTabBar does not use these events, since they are extremely inefficient, and there are better ways of getting the necessary notifications. And honestly, if anyone else is using these events, they shouldn’t be, specifically because of this bug! So, I’m not too worried about that. In the worst case, if I become aware that it is indeed causing a major problem with some other application, I could probably figure out a way to block the event registration only for QTTabBar’s requests, and no others.
Here’s a more technical explanation, for those that found the above explanation lacking in detail. The main function I need for hit testing is IUIAutomation.ElementFromPoint function, which retrieves the element under the mouse. After a long time of stepping through the assembly, I found that the lag was occurring because Automation events were being registered and never unregistered. This registration occurs during the call to UiaReturnRawElementProvider, which oh-so-conveniently includes a pointer to the ItemsView’s IRawElementProviderSimple as a parameter. I saw in the assembly that the element provider’s QueryInterface method is called to get access to its IRawElementProviderAdviseEvents interface, and from there the events are registered. Ahah! I knew immediately that if they were calling QueryInterface, then they will dutifully check the return code to make sure it succeeded. Sure enough, if QueryInterface doesn’t return S_OK, the event registration section is cleanly skipped, without any the loss of any other functionality. Much as I like to rag on Microsoft, I have never been more thankful for their excellent coding practice. It saved us here!
So, with the ever-amazing MinHook library, I placed an API hook on UiaReturnRawElementProvider, which I used to place another hook on the ItemsView’s IRawElementProviderSimple .QueryInterface. Then I can filter out those calls that are asking for IRawElementProviderAdviseEvents and spoof a return of E_NOINTERFACE.
So in the end, it’s pretty close to the best case scenario. The lag is completely fixed. Hit tests are free again, using a documented method that is unlikely to break in Windows 8. I can finally get rid of all that horrible spaghetti code I was using to minimize the number of hit test operations. And even more importantly, I can incorporate a hit test function into the plugin library, so that anyone who wants to write something that requires hit testing can use it.
This whole ordeal has really been a huge learning experience for me. I’m going to back some other problems I was facing with the ItemsView and saying “oh, that’s easy now that I know how to do this…” And I still feel like I’m just scratching the surface of what’s possible. Figuring out how to bend Explorer to my will is such an addictive puzzle. People sometimes ask me why I submit myself to dealing with Microsoft’s problems instead of just making my own file browser from scratch. There are, of course, some real answers to that question, but my first response is “Because this way is so much more fun!”
Thanks for reading. Beta 2 is on the way!