Query with .ToArray() returns different results compared to without
I'm using v4.55.03 premium edition, which I believe is the latest at the time of writing.
I have the following queries in the same tab in "C# Statement(s)" mode:
Each is the same but the first has an extra .ToArray().
The later returns the correct count, about 90,000.
The former returns a semi-random result every time I press F5 varying between about 7,000-9,000.
How is .ToArray() making a difference here? What am I doing wrong?
This is currently preventing me from doing more complex in-memory queries.
I have the following queries in the same tab in "C# Statement(s)" mode:
Accounts
.Select(a => new { Account = new { a.Id },
Contacts = Contact.Where(c => c.AccountId == a.Id)
.Where(c => c.Email != null)
.Select(c => new { c.Email })
.ToArray() })
.ToArray()
.Count()
.Dump();
Accounts
.Select(a => new { Account = new { a.Id },
Contacts = Contact.Where(c => c.AccountId == a.Id)
.Where(c => c.Email != null)
.Select(c => new { c.Email })
.ToArray() })
.Count()
.Dump();
Each is the same but the first has an extra .ToArray().
The later returns the correct count, about 90,000.
The former returns a semi-random result every time I press F5 varying between about 7,000-9,000.
How is .ToArray() making a difference here? What am I doing wrong?
This is currently preventing me from doing more complex in-memory queries.
Comments
Could you post the resulting query from the SQL tab and determine whether the SQL query itself changes? Also, running this SQL command directly, does the result change between runs?
If you want to omit the LINQ-to-SQL translation altogether, you can easily apply .Cache() to Accounts (Accounts.Cache()) and Contacts (Contacts.Cache()) which will cache the entire data set for the active process (and can be cleared using shift-F5).
Yes the SQL used is different between the two as you suggested:
SELECT [t0].[Id], [t1].[Email], ( SELECT COUNT(*) FROM [salesforce].[Contact] AS [t2] WHERE ([t2].[Email] IS NOT NULL) AND ([t2].[AccountId] = [t0].[Id]) ) AS [value] FROM [salesforce].[Account] AS [t0] LEFT OUTER JOIN [salesforce].[Contact] AS [t1] ON ([t1].[Email] IS NOT NULL) AND ([t1].[AccountId] = [t0].[Id]) GO
SELECT COUNT(*) AS [value] FROM [salesforce].[Account] AS [t0]
Running the first SQL statement continually in SSMS gives me the same number of rows every time.
Using .Cache() before .ToArray() gives me stable but incorrect results between runs. Pressing shift-F5 between runs again gives the same unstable results.
E.g.
The first two both return (what looks like) the correct results consistently.
The last one causes many SQL queries and took so long I had to terminate it.
This version still causes incorrect and inconsistent results:
Accounts .Select(a => Contact.Where(c => c.AccountId == a.Id) .Select(c => new { c.Email })) .ToArray() .Count() .Dump();
Not sure if I can reduce it any further.
Rather than need a remote session I thought I might be easier to create steps to reproduce anywhere:
1. Create an empty database called 'Example' in SQL Server
2. Run this SQL to create the Account table (auto-generated, I'm lazy):
CREATE TABLE [salesforce].[Account]( [Id] [NVARCHAR](18) NULL ) ON [PRIMARY]
3. Run this SQL to create the Contact table:
CREATE TABLE [salesforce].[Contact]( [AccountId] [NVARCHAR](18) NULL, [Email] [NVARCHAR](80) NULL ) ON [PRIMARY]
4. Grab a copy of SQL Data Generator
5. Save an run this project against the database (you might have to save that XML as a .sqlgen file and then open it, sorry, this forum has no attachment system)
6. Use 'Generate Data...' to create a load of test data
7. Run this against the Example database in Linqpad:
Accounts .Select(a => Contacts.Where(c => c.AccountId == a.Id) .Select(c => new { c.Email })) .ToArray() .Count() .Dump();
It's certainly not a true reproduction of my database but it does show that the results change between runs with Linqpad when the data in the database isn't changing.
The records between Account and Contact must be related. The SQL Data Generator generates random foreign keys therefore no related records exist. The count will always be 1000. Output: all 1000.
However, this is different when related records exist: Output:
Thanks again for you help!
Sorry the project in reproduction steps should have set SQL Data Generator to populate the Contact table with foreign keys rather than just random data, but I might have forgotten to do that.
With more than 1000 rows the problems becomes more pronounced, rarely giving the same results between runs.
What are we looking at here? I can't see anything wrong with what I'm doing.
Is this a bug in LinqPad? If so how do I report it?
(I got no reply from the in-app "Report Bug" feature when I raised this a couple of days ago)
You're being brilliant help but do you know of any way to contact the LINQPad maintainers to raise this issue directly? It would be good to have their input.
I'm not sure what you mean by "mis-set feature in LINQ-to-Objects" rather than it being a bug as I'm using the out-of-the-box settings/configuration (I assume you are too when you reproduced the issue) and the reproduction database is as simple as they come.
Also, while it shouldn't matter at all, are you doing output to data grids or html?
Yes, running the SQL generated by LinqPAD directly always gives a consistent result.
I'm just using the vanilla 'Results' pane, which I assume is data grids?
Hopefully you can use the reproduction steps I listed above to see the issue for yourself?
Thanks!
There's a single post from myself above posted on May 27th with 7 steps to reproduce including the database schema (which is only about 7 lines of SQL).
I'm not sure I can simplify the reproduction steps any further than that. Is there a specific step which is catching you out?
Given this is reproducible so trivially with a database with only 3 columns and with LinqPAD default settings I'm not convinced this is something the community will be able to get to the bottom of.
I've got no response from sending bug reports in the LinqPAD app. I'm starting to feel let down by my Premium purchase.
As mentioned before, it's very unlikely this is a bug in LINQPad, but rather a quirk in LINQ itself. Also, the author of LINQPad is active on this forum and will notice. As the premium version does not explicitly includes 'premium' support, I think this is the best you'll get for your money.
I'm currently limited on available time, but the first step to take is to re-create the situation in a regular C# Console Project in Visual Studio. My bet is that the same issue will arise.
Regarding the feeling you're let down by your purchase, please take a look at the user requests at http://linqpad.uservoice.com/forums/18302-linqpad-feature-suggestions ; Joe has a rather large back log including the upgrade to C# 6 and intellisense for VB.NET and still manages to find time to reply to new comments and requests, so please give him some credits and await gently
Kind regards
Joris
Do any of you think this could be related to the unexpected Count() results I was getting in this thread: http://forum.linqpad.net/discussion/799/unexpected-results-from-ienumerable-count-not-matching-the-actual-count-with-anonymous-return-type#latest ?
It seems from the details above that the same underlying issue might be responsible for both, and I'd be happy to help out any way I can.
It does seem like our issues are very similar. They could well be the same with slightly different reproduction steps.
If it helps I took the time to create reproduction steps in an above comment using a database with only 2 tables and 3 columns and populating it with sudo-random data. That way others can reproduce the problem without needing a database dump.
Sadly I don't have any additional information or a diagnosis for you. I have to admit that I've started writing queries in SQL rather than LINQPad since I'd have to cross/sanity check the results in SQL anyway if I can't trust LINQPad to return the correct results.
I haven't had an acknowledgement from a LINQPad representative that they've even received the bug report I submitted.
If you do get to the bottom of that the issue though please reply here. This forum doesn't have a means for me to follow/subscribe to your thread as far as I can see.
I'll certainly post any additional info I come across, or a resolution if I stumble upon one.
Thanks again!
I have reproduced the issue in a stand alone Visual Studio 2013 project. Still the same problem therefore not LINQPad related. Perhaps you could share the issue on Stack Exchange?
The demo project is available for download here:
https://drive.google.com/file/d/0B2Y03BpS8L2XRUkwcTFIOGlpWG8/view?usp=sharing
Thanks for taking the time to reproduce the issue.
I think maybe I was using LINQPad for the wrong reasons.
I was using LINQPad as a way to:
- avoid learning SQL (I can do the basics, but I struggle to structure larger queries)
- perform more elaborate queries more easily than in SQL
- format results in more elaborate ways than SSMS allows (e.g. hyperlinks that open in your browser, expanding/collapsing nesting stuff)
I guess you think of LINQPad as a way of try out LINQ to SQL or LINQ to Objects statements/configurations rather than as an alternative query language or a replacement for something like SSMS.I think the real answer here is I should just be improving my SQL-foo and using SSMS, otherwise I'm just going to have to learn about configuring LINQ to SQL/Objects correctly instead (and probably cross-checking my results with SQL in the mean-time anyway).
Thanks for the help though. Hopefully this forum post will be of use to others.