Streaming of sequence elements
If I have a slow sequence (
IEnumerable<T>
) that is I/O bound then I have to wait until the entire sequence has been enumerated for LINQPad to display it in the Results pane. Is there a reason LINQPad doesn't display elements in a streaming fashion, as they are yielded by the enumerator? It seems to do this for observables (IObservable<T>
).
Comments
What sort of sequence do you have that's IEnumerable and streaming? These are quite rare. With most I/O-bound IEnumerables, the blocking happens before the first element arrives, and then everything else arrives quickly after that.
Note that LINQPad also streams IAsyncEnumerable. So you can make an IEnumerable display asynchronously by referencing a library that contains an IAsyncEnumerable implementation and calling .ToAsyncEnumerable on it (or writing an extension method to do that).
Await
from MoreLINQ but that's irrelevant (although I've also shared a version without).Another very simple example would be converting any observable to a sequence via
ToEnumerable()
. I don't understand how this matters when we are talking about just for display? Yep, am aware of that option as well as going from sequences to observables but I don't see why force people to go down that route when it's not needed? Your main thread is usually synchronous and blocking.I want to be able to write an expression query and see the results stream to the Results pane as they arrive. I have to work round this by turning the query into C# statements and use a
foreach
loop just to get the right effect.Streaming is even more crucial when running on the command-line, as in
lprun -format=csv query.linq
.For instance:
new { X = somesequence }.Dump();
For LPRun, however, one can't expect sub-sequences to be streamed out, especially when using the CSV formats (to point out the extreme case) because the structure would have to make sense as a whole. At the same time, if the root is a sequence then its items should be streamed as they are yielded. This is important for extremely large results. Imagine a query that scans files of hourly data, summarizes to daily via grouping as a single sequence. If you have thousands of such files, you don't want to have to wait or have to re-model as an asynchronous sequence just to get streaming.
Just to be clear, if you have this: then there's no expectation that X and Y are streamed out at the same time, even interactively in LINQPad. X would stream first and then dumping would proceed to Y.
Now, depending on how fancy you want to get, it's up to you to decide whether you want to detect a “slow sequence” and switch from buffering to streaming. For example, one could define a tolerance of 500 ms. If the sequence isn't done by then, switch to streaming. It would be reasonable compromise if you want to maintain today's behaviour of assuming sequences are generally fast, but I think this isn't necessary and feels like an overkill.
By the way, any query using a sequence from
Directory.EnumerateFileSystemEntries
that operates recursively through a tree is going to be slow so I think slow sequences are far more common than one would like think.System.Globalization.CultureInfo.CurrentCulture
There are upwards of a 100 sequences in that object graph (they are all arrays, but let's imagine they're IEnumerable sequences).
Right now this is rendered in a single operation, in a single round-trip. Your query calls Dump which converts the object graph into a graph of meta-nodes, then visits them with an HTML writer (or JSON writer if you choose Text results in LPRun) and sends the HTML to the host process to render. If LINQPad rendered each sequence lazily, this architecture would have to be completely redesigned.
Another problem is that your query process would potentially need to make hundreds of round-trips to the host, after each sequence splicing in another piece of HTML via the browser DOM. This would cause a noticeable delay and flicker. To avoid this, it would need to batch together the HTML updates that occur in rapid sequence, which itself is not hard (it does this anyway with observables and IAsyncEnumerable). What's hard is converting this batch into efficient browser DOM actions. Because now you're not just adding rows to a table, but performing operations that alter the DOM at multiple levels.
I think it would be achievable for top-level dumps, but I'm not sure how useful that would be.
If I have an expression query that yields hundreds of thousands of rows bound by I/O-bound operations, then it makes things like the following useless: One can always buffer or stream based on the actual run-time type. If it's strictly
IEnumerable<T>
then stream, but as in the case of a fullCultureInfo
dump, most objects will be arrays, lists or collections and so those could be rendered in single round-trip, like today.The HTML and JSON cases are different. Unlike CSV, which represents unbounded tabular data, they are documents with a root. One wouldn't naturally expect that the document is streamed out (unless you're supporting advanced stream-based document processing upstream; thinking XSLT here).
BTW, not sure how all the efficiency problems you've mentioned are any different for observables and asynchronous sequences.