Eagerly loading with LinqDataAdapter

arturomonge · April 29, 2019, 8:51pm

Hello,

Does DA support eagerly loading? For example, if I have the following objects:

class Invoice 
{
	...
	List<InvoiceLine> InvoiceLines;
}

class InvoiceLine
{
	...
	List<SerialNumber> SerialNumber
}

class SerialNumber
{
	...
}

Is there any way I can load them in a single call using Linq? Server side is not that problematic (though not ideal) to make three Linq calls to the DB server, as the latency is very low. But client side, with latencies in the 100-300ms range, making three calls to a remote server through HTTP is extremely expensive. I know I can use the RemoteDataAdapter to get all tables in a single call, but I lose the strong typing and the ability to work with objects instead of DataTables, so I rather not use if at all possible.

Thanks,
Arturo.

antonk · April 30, 2019, 11:35am

Hello

That’s the infamous n+1 selects problem tat affects all existing ORM (in this case Data Abstract works as an ORM and is also affected by this fundamental problem).

In general Data Abstract doesn’t support navigational properties or eager loading.
Still it IS possible to fetch all data is several related tables using singe query.

Take a look at this sample: GroupLoading.zip (120.1 KB)

It contains two DA LINQ queries that fetch data from several rows at once.

This sample uses the PCTrade.sqlite.db database shipped with Data Abstract, so you can run it without any changes in the connection definitions file.

The server part doesn’t contain any interesting code. Actually the only change added to it after it has been generated by the New Project Wizard is the BeforeExecutingGetDataReader event handler in the DataService implementation. It is used for logging purposes and doesn’t actually affect anything.

In the client-side code the first thing that needs to be explained is the TabbleDefinitions_Extension.cs file. This file contains some additional code for the Orders class generated as part of the LINQ table definitions. Thanks to the partial modifier applied to it we can add additional code to this class in the way that ensures that such additional code won’t be lost when the TableDefinitions file is regenerated.

The code contains an additional property definition (note the attribute applied to it) and Equals/GetHashCode methods pair. The property will be used to store retrieved data while the methods are necessary to allow the Orders class to be used as a key in the LinqToObjects GroupBy method.

Note: In general Table Definition classes do not need these methods because GroupBy is effectively translated into the GROUP BY clause and is executed by the database server itself. At the same LinqToObjects operations are performed over objects in local memory, so .NET runtime need some help to be able to compare objects correctly.

public partial class Orders
{
	[RemObjects.DataAbstract.Linq.IgnoreProperty]
	public IList<OrderDetails> Details { get; set; }

	public override bool Equals(object obj)
	{
        ...
	}

	public override int GetHashCode()
	{
        ...
	}
}

MainForm code-behind contains 2 sample queries.

1st one is a simple join over 2 tables:

		// Aliases to make the query cleaned
		// These assignments do not cause any data request operations
		var orders = this._dataModule.DataAdapter.GetTable<Orders>();
		var orderDetails = this._dataModule.DataAdapter.GetTable<OrderDetails>();
		var products = this._dataModule.DataAdapter.GetTable<Products>();

		var query = orders
			.Where(o => o.OrderStatus != 128)
			.Join(orderDetails,
				o => o.Id,
				od => od.OrderId,
				(order, details) => new { Order = order, Details = details })
			.ToList()
			.Select(g => g)
			.GroupBy(g => g.Order)
			.Select(g =>
			{
				g.Key.Details = g.Select(r => r.Details).ToList();
				return g.Key;
			});

What happens here:

A WHERE condition defined. It is better to apply conditions earlier as this will reduce the server → client traffic and the db workload
JOIN between the Orders and OrderDetails tables is defined
.ToList() call is issued. At this point the query will be sent to the DA server and executed. The result is a denormalized join result between the Orders and OrderDetails tables. All Linq operators below are performed over local in-memory objects, not over the DB rows.
.GropyBy operation is performed. This is the point where earlier defined Equals/GetHashCode methods are used.
The .Select operation is performed. This operation takes the LINQ group object and converts it into a simple Orders object with initialized Details property.

The second query is a bit more complex because it involves more than one Join:

		var query = orderDetails
			.Join(products,
				od => od.ProductId,
				p => p.Id,
				(od, p) => new { Detail = od, Product = p })
			.Join(orders,
				od => od.Detail.OrderId,
				o => o.Id,
				(detail, order) => new { Order = order, Detail = detail })
			.Where(o => o.Order.OrderStatus != 128)
			.ToList()
			.Select(g => g)
			.GroupBy(g => g.Order)
			.Select(g => new { Order = g.Key, Details = g.Select(r => r.Detail).ToList() });

Still it uses the same approach:

Fetch denormalized data
Normalize it back client-side

arturomonge · May 2, 2019, 8:41pm

Thanks, antonk,

I see the process and it works very well. It does bring extra data in the response, but it’s much better than making several calls to the remote server, . Thank you!

I see that I can also edit the data and apply changes as usual, so the approach suggested works perfectly well for what I need:

var data = query.ToList();
var Line = data.FirstOrDefault().Details.FirstOrDefault();
Line.Qty += 1;
this._dataModule.DataAdapter.UpdateRow(Line);
this._dataModule.DataAdapter.ApplyChanges();

In the case of the second query, I see the resulting SQL is the following:

SELECT "t0"."CustomerId"
	(...)
FROM (
	SELECT "t1"."CustomerId"
		(...)
	FROM (
		SELECT "t3"."Id"
			(...)
		FROM "OrderDetails" "t3"
		INNER JOIN "Products" "t4"
			ON ("t3"."ProductId" = "t4"."Id")
		) "t2"
	INNER JOIN "Orders" "t1"
		ON ("t2"."OrderId" = "t1"."Id")
	) "t0"
WHERE ("t0"."OrderStatus" <> 128)

However, the inner subquery will make a join of all the order details in the table, with the products table, which is probably very expensive if the order details table has millions of records (as some of our tables do):

SELECT "t3"."Id"
	(...)
FROM "OrderDetails" "t3"
INNER JOIN "Products" "t4"
	ON ("t3"."ProductId" = "t4"."Id")

How can this Linq query be rewritten, so that the join with products is done at the end, on the least amount of records, after filtering by order id?

Thanks,
Arturo.

antonk · May 3, 2019, 10:36am

That’s what the “denormalization” means - we save the server → client roundtrips amount at the cost of some additional data being sent. Anyways the data stream is compressed so the actual data size increase is not that bad (ofc unless you have blob fields in the header table)

A small performance tip - if you use the header table (Orders in the samples above) only for data read purposes then you might consider mention only primary key fields in the Equals/GetHashCode methods.

However this is a performance micro-optimization that should be applied only when you don’t have other means to improve performance.

A “big” database server (like MS SQL) will optimize that for you.

You can test this by running a SQL query against your database from some management tool. Unfortunately I don’t have a table with millions of rows at my disposal, so I would be really grateful if you’ll post results here.