Home

Query with .ToArray() returns different results compared to without

edited May 2015
I'm using v4.55.03 premium edition, which I believe is the latest at the time of writing.

I have the following queries in the same tab in "C# Statement(s)" mode:

Accounts .Select(a => new { Account = new { a.Id }, Contacts = Contact.Where(c => c.AccountId == a.Id) .Where(c => c.Email != null) .Select(c => new { c.Email }) .ToArray() }) .ToArray() .Count() .Dump();

Accounts .Select(a => new { Account = new { a.Id }, Contacts = Contact.Where(c => c.AccountId == a.Id) .Where(c => c.Email != null) .Select(c => new { c.Email }) .ToArray() }) .Count() .Dump();

Each is the same but the first has an extra .ToArray().

The later returns the correct count, about 90,000.

The former returns a semi-random result every time I press F5 varying between about 7,000-9,000.

How is .ToArray() making a difference here? What am I doing wrong?

This is currently preventing me from doing more complex in-memory queries.

Comments

  • edited May 2015
    Assuming you're accessing a LINQ-to-SQL data source, the intermediate query with .ToArray() will fetch the entire data set before applying count. The other query will translate the .Count() to SQL (e.g. COUNT(*)).

    Could you post the resulting query from the SQL tab and determine whether the SQL query itself changes? Also, running this SQL command directly, does the result change between runs?

    If you want to omit the LINQ-to-SQL translation altogether, you can easily apply .Cache() to Accounts (Accounts.Cache()) and Contacts (Contacts.Cache()) which will cache the entire data set for the active process (and can be cleared using shift-F5).
  • Hey nescafe,

    Yes the SQL used is different between the two as you suggested:

    SELECT [t0].[Id], [t1].[Email], ( SELECT COUNT(*) FROM [salesforce].[Contact] AS [t2] WHERE ([t2].[Email] IS NOT NULL) AND ([t2].[AccountId] = [t0].[Id]) ) AS [value] FROM [salesforce].[Account] AS [t0] LEFT OUTER JOIN [salesforce].[Contact] AS [t1] ON ([t1].[Email] IS NOT NULL) AND ([t1].[AccountId] = [t0].[Id]) GO

    SELECT COUNT(*) AS [value] FROM [salesforce].[Account] AS [t0]

    Running the first SQL statement continually in SSMS gives me the same number of rows every time.

    Using .Cache() before .ToArray() gives me stable but incorrect results between runs. Pressing shift-F5 between runs again gives the same unstable results.
  • In fact using .Cache() without .ToArray() yields the same incorrect results and also returns different results between runs when using shift-F5.
  • Could you try simplifying your solution and expanding to see when the random effect appears?

    E.g.
    //First run:
    Account
    .Select(a => a.Id)
    .ToArray()
    .Count()
    .Dump();

    //Second run:
    Account
    .Select(a => new { Account = new { a.Id })
    .ToArray()
    .Count()
    .Dump();

    //First run w/ join:
    Account.Select(a => new { Account = new { a.Id },
    Contacts = Contact.Where(c => c.AccountId == a.Id)
    .Where(c => c.Email != null)
    .Select(c => new { c.Email })
    .Take(1)
    .ToArray())
    .ToArray()
    .Count()
    .Dump();
  • Hey nescafe,

    The first two both return (what looks like) the correct results consistently.

    The last one causes many SQL queries and took so long I had to terminate it.

    This version still causes incorrect and inconsistent results:

    Accounts .Select(a => Contact.Where(c => c.AccountId == a.Id) .Select(c => new { c.Email })) .ToArray() .Count() .Dump();

    Not sure if I can reduce it any further.
  • I've sent you a message to set up a remote session, mat.
  • Hey nescafe,

    Rather than need a remote session I thought I might be easier to create steps to reproduce anywhere:

    1. Create an empty database called 'Example' in SQL Server
    2. Run this SQL to create the Account table (auto-generated, I'm lazy):
    CREATE TABLE [salesforce].[Account]( [Id] [NVARCHAR](18) NULL ) ON [PRIMARY]
    3. Run this SQL to create the Contact table:
    CREATE TABLE [salesforce].[Contact]( [AccountId] [NVARCHAR](18) NULL, [Email] [NVARCHAR](80) NULL ) ON [PRIMARY]
    4. Grab a copy of SQL Data Generator
    5. Save an run this project against the database (you might have to save that XML as a .sqlgen file and then open it, sorry, this forum has no attachment system)
    6. Use 'Generate Data...' to create a load of test data
    7. Run this against the Example database in Linqpad:
    Accounts .Select(a => Contacts.Where(c => c.AccountId == a.Id) .Select(c => new { c.Email })) .ToArray() .Count() .Dump();

    It's certainly not a true reproduction of my database but it does show that the results change between runs with Linqpad when the data in the database isn't changing.
  • Reproduced after modification:

    The records between Account and Contact must be related. The SQL Data Generator generates random foreign keys therefore no related records exist. The count will always be 1000.
    Enumerable
    .Range(0, 100)
    .Select(i =>
    Accounts
    .Select(a => Contacts.Where(c => c.AccountId == a.Id)
    .Select(c => new { c.Email }))
    .ToArray()
    .Count())
    .Dump();
    Output: all 1000.

    However, this is different when related records exist:
    INSERT Contact ( AccountId, Email )
    SELECT TOP 50000 Account.Id, Contact.Email
    FROM Account, Contact
    ORDER BY NEWID()
    Output:
    985 
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    982 < --
    981 < --
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    983 < --
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    979 < --
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
    985
  • Hey nescafe,

    Thanks again for you help!

    Sorry the project in reproduction steps should have set SQL Data Generator to populate the Contact table with foreign keys rather than just random data, but I might have forgotten to do that.

    With more than 1000 rows the problems becomes more pronounced, rarely giving the same results between runs.

    What are we looking at here? I can't see anything wrong with what I'm doing.

    Is this a bug in LinqPad? If so how do I report it?
    (I got no reply from the in-app "Report Bug" feature when I raised this a couple of days ago)
  • I think it's unlikely this is a problem in LINQPad, but is due to a mis-set feature in LINQ-to-Objects (introduced by ToArray() or AsEnumerable()). Cannot determine exactly what it is, have tried to eliminate NULL values but that didn't succeed. Using explicit joins doesn't help either.
  • Hey nescafe ,

    You're being brilliant help but do you know of any way to contact the LINQPad maintainers to raise this issue directly? It would be good to have their input.

    I'm not sure what you mean by "mis-set feature in LINQ-to-Objects" rather than it being a bug as I'm using the out-of-the-box settings/configuration (I assume you are too when you reproduced the issue) and the reproduction database is as simple as they come.
  • apologies if you already did this and I'm just not seeing it, but if you take the generated SQL for the problematic case and run that SQL directly, is it consistent/correct in its output?

    Also, while it shouldn't matter at all, are you doing output to data grids or html?
  • Hey jamesmanning,

    Yes, running the SQL generated by LinqPAD directly always gives a consistent result.

    I'm just using the vanilla 'Results' pane, which I assume is data grids?

    Hopefully you can use the reproduction steps I listed above to see the issue for yourself?
  • I'm having a little trouble trying to parse out a canonical set of repro steps from the thread - could you maybe create a gist/pastebin/sqlfiddle/whatever that created the tables and inserted the data and then the 'these work' queries and 'these break' queries?

    Thanks!
  • Hey jamesmanning,

    There's a single post from myself above posted on May 27th with 7 steps to reproduce including the database schema (which is only about 7 lines of SQL).

    I'm not sure I can simplify the reproduction steps any further than that. Is there a specific step which is catching you out?
  • Does anyone know how to contact LinqPAD themselves?

    Given this is reproducible so trivially with a database with only 3 columns and with LinqPAD default settings I'm not convinced this is something the community will be able to get to the bottom of.

    I've got no response from sending bug reports in the LinqPAD app. I'm starting to feel let down by my Premium purchase.
  • edited June 2015
    Hi Mat,

    As mentioned before, it's very unlikely this is a bug in LINQPad, but rather a quirk in LINQ itself. Also, the author of LINQPad is active on this forum and will notice. As the premium version does not explicitly includes 'premium' support, I think this is the best you'll get for your money.

    I'm currently limited on available time, but the first step to take is to re-create the situation in a regular C# Console Project in Visual Studio. My bet is that the same issue will arise.

    Regarding the feeling you're let down by your purchase, please take a look at the user requests at http://linqpad.uservoice.com/forums/18302-linqpad-feature-suggestions ; Joe has a rather large back log including the upgrade to C# 6 and intellisense for VB.NET and still manages to find time to reply to new comments and requests, so please give him some credits and await gently :)

    Kind regards

    Joris
  • Hello,

    Do any of you think this could be related to the unexpected Count() results I was getting in this thread: http://forum.linqpad.net/discussion/799/unexpected-results-from-ienumerable-count-not-matching-the-actual-count-with-anonymous-return-type#latest ?

    It seems from the details above that the same underlying issue might be responsible for both, and I'd be happy to help out any way I can.
  • Hey Blue_Ninja,

    It does seem like our issues are very similar. They could well be the same with slightly different reproduction steps.

    If it helps I took the time to create reproduction steps in an above comment using a database with only 2 tables and 3 columns and populating it with sudo-random data. That way others can reproduce the problem without needing a database dump.

    Sadly I don't have any additional information or a diagnosis for you. I have to admit that I've started writing queries in SQL rather than LINQPad since I'd have to cross/sanity check the results in SQL anyway if I can't trust LINQPad to return the correct results.

    I haven't had an acknowledgement from a LINQPad representative that they've even received the bug report I submitted.

    If you do get to the bottom of that the issue though please reply here. This forum doesn't have a means for me to follow/subscribe to your thread as far as I can see.
  • Thanks for your reply, Matt. You can click the little star icon atop a thread topic, to have it bookmarked on the right-side menu, if that helps.

    I'll certainly post any additional info I come across, or a resolution if I stumble upon one.

    Thanks again!
  • edited June 2015
    Mat,

    I have reproduced the issue in a stand alone Visual Studio 2013 project. Still the same problem therefore not LINQPad related. Perhaps you could share the issue on Stack Exchange?

    image

    The demo project is available for download here:

    https://drive.google.com/file/d/0B2Y03BpS8L2XRUkwcTFIOGlpWG8/view?usp=sharing
  • edited June 2015
    Hey nescafe,

    Thanks for taking the time to reproduce the issue.

    I think maybe I was using LINQPad for the wrong reasons.

    I was using LINQPad as a way to:
    • avoid learning SQL (I can do the basics, but I struggle to structure larger queries)
    • perform more elaborate queries more easily than in SQL
    • format results in more elaborate ways than SSMS allows (e.g. hyperlinks that open in your browser, expanding/collapsing nesting stuff)
    I guess you think of LINQPad as a way of try out LINQ to SQL or LINQ to Objects statements/configurations rather than as an alternative query language or a replacement for something like SSMS.

    I think the real answer here is I should just be improving my SQL-foo and using SSMS, otherwise I'm just going to have to learn about configuring LINQ to SQL/Objects correctly instead (and probably cross-checking my results with SQL in the mean-time anyway).

    Thanks for the help though. Hopefully this forum post will be of use to others.
  • Given nescafe's repro project does repro for me too, has anyone put this on connect?
Sign In or Register to comment.