Table query speed

Question

question

Axel Kohonen asked Mar 29, '19 Axel Kohonen commented Jun 25, '19

Table query speed

Hi,

A few questions about the Table query execution speed, which do not seem to be documented anywhere.Please point me to the documentation if it exists.

How does the Table.query command work when multiple criteria are given? It seems that the speed of the query depends (sometimes a lot) on the order of the criteria. E.g. first criteria being something like a general category (category = "food") and second e.g. weight on row (weight >= 50) does not give the same result as giving the criteria the other way around. This is obviously not visible with small tables, but if one has 10000+ rows then it can make a big difference.
Do the later criteria check only the subset that the first criteria returns?
Does ORDER BY always check all rows of the table or only the subset that the query returned?
Is the table query faster with bundle tables?
What does "optimize column for lookup" do and when should it be used?
Can I optimize multiple columns for lookup?
Does it even make sense to use Table.query for very large tables or should some other method be used?

Thank you!

Kind regards,

Axel

Software Version:

FlexSim 18.2.3

query sql queries speed table

Answer 1 · 2019-04-09T20:28:07Z

anthony.johnson answered Apr 9, '19 Axel Kohonen commented Jun 25, '19

1. The SQL engine's AND expression works like most other evaluators in that it stops evaluation if the left operand is false. This means that if your left operand effects a narrower filter, then it would naturally go faster because you are filtering things earlier and not needing to evaluate as many right operands.

2. As explained in 1, an AND will stop evaluation if the left operand is false, so technically yes, subsequent requirements are only evaluated on the subset whose left operand is true.

3. FlexSim's SQL engine works the same as most SQL engines. The execution happens in the following order:

Filter (WHERE / ON) - It will traverse the source table(s) and create a filtered subset based on the ON and/or WHERE clauses.
Aggregate (GROUP BY) - If there are aggregation functions in the SELECT clause or a GROUP BY clause, it will aggregate the filtered subset into a new subset, categorized by the GROUP BY.
Filter Aggregation (HAVING) - If there is a HAVING clause, it will further filter the aggregated subset.
Sort (ORDER BY) - Sorting is the last operation to be done. It is only done on the final subset.

4. We haven't done rigorous speed tests comparing tree table queries to bundle table queries. I would think that bundles are faster, primarily because bundle tables are more cache-friendly. But the amount of improvement is likely dependent on the situation.

5. The option "optimize column for lookup" only applies to bundle tables. It primarily allows you to use the getbundleindexedentries() command. While it can improve some performance with the query command, this only applies when using a join, and it only saves the time that the query would normally use to build a cross-index. So, at least for now, it's not a great improvement for Table.query().

6. Right now there's not really a direct way to optimize certain columns for lookup.

7. If you are essentially trying to run Table.query() to lookup a single row in a single table, lots and lots of times, then right now, Table.query() is probably not your best bet. I would suggest using the getbundleindexedentries() command. That said, the groundwork has been laid for us to make Table.query() faster in your scenario. We've already implemented an indexing mechanism that makes inner joins orders of magnitude faster. We just need to also use that same (or similar) indexing mechanism in a single-table-lookup scenario. So hopefully this will be an improvement in the not-too-distant future.

· 3

Axel Kohonen commented · Jun 24, 2019 at 04:25 AM

Hi @anthony.johnson

Thank you for the thorough answer! It is great to have it all in one place so that one can build better queries. Maybe put this answer in the user manual in some form?

Thank you also for the comment about finding a single row. Does this apply also to finding a few rows e.g. with the same product number or is a query best for that case?

I have tried using e.g. a partitioned list with the product number as the thing to partition by. This works well if I only need to search by the product number, but setting up and updating the list alongside with the table always requires a bit of extra work with a dynamic table such as the inventory of items in a warehouse. Better suggestions on how to do this?

Thank you!

Kind regards,
Axel

0 ·

Mischa Spelt Axel Kohonen commented · Jun 24, 2019 at 04:24 PM

Hi Axel,

After our initial enthusiasm of the introduction of SQL in FlexSim had faded a bit, we found that they are not the solution to all problems. Indeed, if you need a lot of fast lookups, there are better ways.

Of course, it works best if you can jump straight to the row you want. So in my models, for example, I will save an FooBarIndex which is the row number of a foobar (e.g. an order, SKU, schedule table), rather than the FooBar ID which may be a random number or even string, so that I don't have to lookup the row number every time I need to get a property from my foobar.

If that doesn't help, and you still want to have a table, you can manually make lookups faster. For example, if you can order your table by the lookup column, you can do a binary search instead of a loop, which can give massive speed improvements for large tables; especially for tables which are initialized once at the start or where you do a lot of lookups compared to inserts (in particular for treenode tables instead of bundles, insertions are cheap and finding the place to insert a row is logarithmic with binary search).

Potentially you don't even need a table, you can keep data on token labels or in arrays. You can also think of hybrid variants, e.g. if you cannot sort your table but your IDs are guaranteed to be 1, 2, 3, ..., n; you can create an Array with n elements and store the row numbers where item i can be found in the table at index i, so that you can do a row number lookup in the Array and then jump straight to the correct row. Even when the IDs are not consecutive but reasonably ranged, this can be worth it (e.g. if you have 100 orders with random three-digit order IDs, you can still create an array with 1000 elements with mostly zeroes and some row numbers on the order IDs that do exist).

Sometimes it's worth keeping multiple places updated, as you mentioned in your post. It's the ever continuing question of space vs time -- if you change little compared to looking up data, the extra effort you need to put in when you do make a change may be worth it.

Finally, hopefully a bit of an open door, but caching really does help too. If information doesn't change, don't look it up or recalculate it all the time. Make list fields non-Dynamic, if possible, copy data you use a lot to a label instead of evaluating Table("...")[row][col] each time, etc.

1 ·

Axel Kohonen Mischa Spelt commented · Jun 25, 2019 at 04:43 AM

Hi @Mischa Spelt

Thank you for the answer! Great to hear the experience of someone else as well.

Actually I have used many of the different options you describe, both using indexes and writing all the rows corresponding to the token/item onto a label when resetting the model and going through the input data rows in order. I have also done the lookups in the input data in Excel so that I know the direct index to a row in the table in the model without having to look it up.

But for tables that are dynamically populated in the FlexSim model this approach does obviously not work, instead one needs to lookup the values (therefore this question) or add the row indexes into e.g. a partitioned list when new values are added to the table and lookup from the list. The list seems superior in speed e.g. when partitioning and looking up by e.g. product number (found on a few rows in a large table), but it adds a layer of complexity.

Axel

0 ·

question