LINQPad pathologically slow behavior on large scripts after many edits
TL;DR
The syntax-highlighting editor used by LINQPad 6 (and presumably LINQPad 5) exhibits O(n^2) behavior after many edits
Version info
(a second bug: Ctrl-C does not copy the currently selected text in the About Box. Also, it would probably be good if there was a way
to copy the LINQPad version and CPU type..)
LINQPad 6 v6.6.8 (X64)
.NET Core version (host): 3.1.1
.NET Core version (queries): 3.1.1
Roslyn Version: 3.4.0-beta4-19568-04
FSharp.Compiler.Service version: 22.0.3.0
Background
I often use LINQPad to prototype new designs for things; between the built-in debugger and its expansive visualization capabilities, I can very rapidly test out new ideas. As such, I frequently end up with scripts on the order of 10KB, sometimes much larger.
One thing that has afflicted my LINQPad (versions 5 and 6) is that for some queries -- and it always seems to be large ones -- the editor starts slowing down noticeably: consider, for example, the incident that prompted this posting: I could start typing "the quick brown fox jumped over the lazy dogs", by the time I finish typing "dogs", the UI hadn't even started processing "over" yet.
When this happens, if I close and re-open the query, the UI becomes speedy again, at least for a little while; eventually, however, it starts slowing down again. This also occurs if I choose Clone Query: the new query window is nice and responsive, while the old one continues to be painfully slow. (This is insidious, because it means that I can't just file a bug report saying "this query is slow to edit" because it won't be, not at first.)
Troubleshooting attempt 1: ETW trace
I ran an ETW trace while the problem was happening, and I saw LINQPad6.exe sitting at a baseline of about 12.5% CPU usage with brief spikes of up to 25%. As this is a i7-6700 (4-cores/8-threads), that suggests one thread (the UI thread) steadily consuming 100% of a single processing unit, with a second thread doing brief bursts of work on a second processing unit.
Troubleshooting Attempt 2: Visual Studio trace
This is where I struck gold.
I ran a CPU sampling with VS2019, and that gave me a much more clear picture: 66% of the CPU time was spent in the StringBuilder::get_Chars
method, which is the indexer for StringBuilder (char ch = sb[idx]
).
The biggest callers of this method are:
Method | CPU time |
---|---|
EditorView::DrawTextAreaDisplayLines |
26% |
DisplayLineCollection::GetLayout |
25% |
SyntaxEditorRenderingCache::MeasureText |
10% |
DisplayLineCollection::GetLineLength |
3% |
EditorView::GetCharacterColumn |
1% |
(everything else was less than 1%)
StringBuilder internals
Some interesting facts about the StringBuilder implementation in .NET Core and .NET 4 (though not .NET 2):
- It stores the character data in a "rope" structure: a singly-linked list of chunks, stored in reverse order: that, is the head of the list is the last chunk, and each element contains a link to the previous chunk. (This makes a lot of sense, because since StringBuilder is for building a string; as such, most edits are likely to happen at the end, and having direct access to the end means appends are O(1).)
- If you add a very large amount of data to the StringBuilder at once -- such as when you initialize it from a string, or have very large Insert/Append -- it'll put it all into one oversized chunk
- If you're inserting short strings -- such as single characters -- into the middle, however, it'll break things up into 16-character chunks, to minimize the amount of data movement needed.
Therefore, if you're inserting one character at a time into a StringBuilder, you'll mostly end up with a linked-list of 16-character chunks.
Therefore, after enough edits, the average number of steps needed to retrieve the character at a specific index is ((n / 16) / 2), which means that the StringBuilder index algorithm is O(n).
However, those methods? DrawTextAreaDisplayLines
, GetLayout
, MeasureText
, GetCharacterColumn
? I suspect they're being called in a loop, for all of the characters in the file. Which is n operations. And doing an O(n) operation n times is O(n^2).
This explains why reopening the file makes things fast again: the StringBuilder is initialized once with the entire contents of the file, so it's a single chunk -- making get_Chars
O(1), and the editor's methods O(n). But as I start editing -- and most of my edits get done at the beginning of the file -- it starts slowing down.
As a final test, I went back to the tab exhibiting one-second-per-letter behavior, scrolled to the very end (so viewing and editing on the head chunk) and started typing; the UI reacted instantly. I went back to the very first line and started typing, and lo! One second per letter.
Comments
Thanks for that excellent analysis. I've written a query to test the performance characteristics of StringBuilder under those conditions and you're absolutely correct:
http://share.linqpad.net/54c67i.linq
I've just uploaded a new LINQPad 6 beta (6.7.3) with the editor modified such that it recreates the StringBuilder every 250 edits to prevent excessive fragmentation. Let me know whether this fixes the problem.
Joe, I tried the beta 6.7.3 and seems to offer a dramatic improvement in performance.
I wrote a script to send a series of "// xxxxxxx" lines (100 'x'es per line) using SendKeys/SendInput, and while there's still a problem somewhere, it's obviously much faster:
LINQPad 6.6.8:
(at this point the script crashed; I have no idea why, I suspect a bug in the Automation code I'm using to interrogate the text editor.)
On LINQPad 6.7.3 things looked much better:
It's obvious that something is slowing down eventually, but it's a lot less noticeable.
If you want to hammer it a lot on your own, here's a script that can do that:
http://share.linqpad.net/27mn5p.linq
When running it, turning on CAPS LOCK for a few seconds should cause it to stop trying to dump keyboard output, so you can switch to another application without causing massive problems.
I suspect you might need a different data structure entirely to store your text -- but since the real fault seems to lie in the editor control, I doubt you can simply swap in a new one. You've made it not-pathologically-slow, but it's still doing a https://wiki.c2.com/?ShlemielThePainter algorithm.
Thanks, that's good news. I'll add the same fix to LINQPad 5 in the next few days. In terms of further improving performance, I believe that the bottlenecks are no longer in StringBuilder (although let me know if you have evidence otherwise).
I've run another trace, and I definitely don't see StringBuilder::get_Chars() on there anymore.
Unfortunately, I can't work out exactly where the next slowdown lies, because I'm running up against what seems to be a Visual Studio bug, and I don't know how to report it to Microsoft.
In multiple places where an async method is invoked, there's a call out to EtwEventWriteTransfer (in advapi32.dll), below which lies a whole new copy of the call stack, all the way up to the __scrt_common_main_seh call (the executable entry point)! As a result, a large majority of methods have their CPU time incorrectly computed, and I can't really find the true hot spots.
Hope this is not out of place. But I just wanted to post a huge
THANK YOU!!!
I've been using LINQPad for years in much the same way as OP to prototype ideas and sometimes they get fairly complex and lengthy. And this has been a very significant pain point for me, that the editing gets incredibly slow after awhile. I am so happy that the root cause of this has been uncovered. Please accept my sincerest gratitude!
Best Regards,
~ mellamokb
Thanks for your support! And a big thank-you to @StevieO.
Joined the forums just to echo the THANK YOU sentiments! I've been working around it by copy/pasting the text, or losing my undo buffer. It makes so much more sense now!