Home
Options

Identical group by results different between Linqpad and Visual Studio

I'm reasonably decent creating linq queries and I have a case where I'm using entity framework to query a table where I need to group the results based on 8 columns. The groupable data itself (columns) are account ID, person ID, premise ID, device ID, address, city, state, and zip. Basically, a person can have multiple accounts, multiple premises, multiple devices, and each premise can have it's own address.

The general idea here is that I need to pull in all the data from all of these 8 fields, and identify the unique combinations of these 8 fields. The table in question has many other columns that make these 8 columns have duplicates across other rows if I just select all and is the whole reason I'm doing a group by. I happen to know that the table has approximately 32k rows and when the group by is applied, it drops to 23k groups.

The linqpad query works great... when I put basically the same code into my application code in Visual Studio the number is like 1400 fewer groups. The query:

Linqpad:

var zAccountGroups = (from za in zAccounts
                      group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zaGroups
                      select zaGroups).OrderBy(zag => zag.Key.per_id).ToList();

Visual Studio:

List<z_AccountInfo> zAccounts = db.z_AccountInfo.OrderBy(z => z.per_id).ToList();

var zAccountGroups = (from z_AccountInfo za in zAccounts
                                      group za by new { za.acct_id, za.per_id, za.dev_id, za.prem_id, za.address, za.city, za.state, za.postal } into zag
                                      select zag).ToList();

zAccounts is all of the table's rows which are queried inside an Entity Framework context / using block. The zAccountGroups grouping is performed outside of the EF context/using block. From what I've read, group by requires a selecting the groups into an anonymous type and you can't instantiate a list of anonymous type outside of the EntityFramework context and then assign to it within the context - and I need the groups outside the context.

In any case, this structure works to get all rows, then group them outside the EF context... the problem is, the results of grouping are wrong. Doing a SQL group by on the table in SSMS yields the same number of groups as LinqPad - Visual Studio has fewer groups and I can't explain why... needless to say it's driving me nuts. Anyone seen something like this before and have ideas as to what I'm doing wrong?

Comments

  • Options
    edited September 2022

    Visual Studio and LINQPad are just tools for executing code. The code, in this case, is an object/relational mapper that translates LINQ into SQL. A common reason for there being a difference in results is when you use different object/relational mappers: LINQ-to-SQL in LINQPad and EF Core in Visual Studio. You can avoid this by choosing the EF Core driver in LINQPad, which should then match what you're using in Visual Studio. You can then look at the LINQPad's SQL translation tab and compare the generated SQL with when a LINQ-to-SQL connection is used.

    In your case, however, it's more likely that the difference is because your queries are different. In the LINQPad query, you're getting SQL server to perform the query. In your Visual Studio query, you're first retrieving an entire table into memory and then querying from memory. This is usually unnecessary and inefficient; moreover, it changes the querying semantics. Whereas SQL Server applies a collation specific to the database (case insensitive by default), .NET uses System.String collation which is case sensitive ordinal unless you provide a different comparer. So the different in results is most likely due to case sensitivity in the grouping fields.

  • Options

    After troubleshooting for a day, I found that the issue was not with my group by at all... the list itself was not reflecting the actual data in the table. What I observed with SQL Profiler was that SQL Server was returning the correct data to Entity Framework in Visual Studio, but my List was manipulating the results and I believe I traced this back to the fact that the table in my Entity Framework model had an Entity Key set on a column that did not have unique values. To resolve my issue, I created a new column in the table and set it as an Identity Specification and enabled auto-incrementing... I suppose a unique constraint on the column would have worked as well. After updating my Entity Framework model, the List in Visual Studio then matched the data in the underlying table.

    I'm not sure why exactly, but Entity Framework was persisting incorrect values into other columns in my accounts List. This manipulation reduced the number of groups that the group by method was able to distinguish. I'm not a pro with Entity Framework, but maybe that's common knowledge that the Entity Key column must contain unique values - if that's a requirement, I was unaware of it and it ultimately seemed to be the root of my issue.

Sign In or Register to comment.