Idea

Serge A avatar image
1 Like"
Serge A suggested Jordan Johnson commented

Display format for enumerated values in StatisticsCollectors

It would be useful to be able to store categorical data in StatisticsCollector's columns and display it in a human-readable form, e.g. object state, mode, current Type/SKU, etc. Typically the categories are mapped to positive integers.

Current options are: 1) store and display numbers, it is fast and space-efficient but such tables are not very readable by humans; 2) store and display strings, it is much slower, strings apparently use lots of memory, but the table is human-readable; 3) store numbers, use a CalculatedTable to convert them to human-readable descriptions, this approach is cumbersome.

Proposal. StatisticsCollector already allows to define a rule how a stored value is displayed. Model.dateTime can be saved as double and displayed "Date / Time". StatisticsCollector.getID(treenode) can be saved as double and displayed as "Object". Would it be possible to extend this approach to enumerated values? Possible implementation can add these Display Formats:

  • "By Table Lookup" (generic for model-specific categorical values)
  • "State Name" (for object state profiles)
statistics collectorfeature request
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Comment

Jordan Johnson avatar image
2 Likes"
Jordan Johnson commented

For objects, if the object is present on reset, you can store the object's ID and use the Object format. This will store a number unique to the object, but show the path. You can control how much of the path to show using the Object Format Max Depth variable.

For all other string categories, you should use the String type field on the stats collector. Stats collectors always use the BUNDLE_FIELD_TYPE_VARCHAR for string fields. This field type only stores one instance of each unique string. Each entry in the bundle stores a 4-byte index into the list of unique strings. This allows for any length of string, and is also very memory efficient.

As for speed, yes, technically reading/writing a number is usually faster. However, the difference is trivial. I made this test model that causes 2 million events, where a stats collector writes a single row/value for each one. Here are the times I got on my machine for this model:

  • Just the model (no stats collectors) - 13.59 seconds
  • Model + Writing 2 million number values - 17.43 seconds (+3.84 s, 520k values/s)
  • Model + Writing 2 million string values - 18.08 seconds (+4.49 s, 445k values/s)

So using a string field cost 0.65 seconds extra to write 2 million values. In those models case, it also cost less than 1 kb of additional memory use string vs number data.

In a real model that records 2 million data points, the model time will usually be much, much greater than 14 seconds. Suppose it takes 2 minutes to run that model enough to gather 2 million points of data. If we make it so that people record numbers instead of strings, we would save the user 0.65/120, or 0.5% of their model time. Most people won't even notice that difference.

teststringfields.fsm

5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Your Opinion Counts

Share your great idea, or help out by voting for other people's ideas.