question

Noah Z avatar image
0 Likes"
Noah Z asked Noah Z commented

Experimenter not making it through all replications

I have a model that I set to run 50 replications in the experimenter. It gets through 17-20 replications on the single scenario I am running and then the remaining replications never start.

Unfortunately I can't upload the model to this public forum or by private question due to it's sensitive nature but I'm wondering if there are any ideas from the community about what might be going on and what might be done to mitigate. I do realize not having the model to look at significantly reduces the chances of this getting figured out here.

My hunch is that the replications that are being run in the background while experimenter is working aren't terminating properly as when I reset the experimenter and try to re-run the experiment again, no replications begin at all. It takes me restarting my computer to be able to get any replications working again with the same outcome seen again (i.e. stopping at ~17 or so replications).

Any ideas?

FlexSim 20.0.0
experimenterflexsim 20.0.0
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Jordan Johnson avatar image
1 Like"
Jordan Johnson answered Noah Z commented

@jason.lightfoot's answer is an excellent guide to troubleshooting the experimenter. Another possibility is that you might be experiencing a bug with using the "restore original state" checkbox, which we fixed in version 20.0.7. If you upgrade to version 20.0.9 (the latest bug fix at time of writing), do you still experience the issue?

· 1
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Noah Z avatar image Noah Z commented ·

Just ran the model with V 20.0.9 and it worked with no issues or hangups. Thanks for your help.

0 Likes 0 ·
Jason Lightfoot avatar image
3 Likes"
Jason Lightfoot answered Noah Z commented

The replications are likely crashing and therefore not terminating correct as you guessed. If you start with 8 cores running then as each replication fails you have fewer cores picking up jobs - you only need 8 to fail and the experiment will stop.

Can you first check that the model is repeatable so that say replication 1 always gets the same result? Then check that you can run that replication interactively and get the same result? If you can then it means we should be able to diagnose each failing replication. If not then we need to diagnose why the model isn't repeatable.

In case you've not seen the option to select the replication number to run interactively, it's in the experimenter's advanced tab:

The next step is to then interactively run the replications that fail and look for problems that cause it to crash. I'd start with the replication that crashes earliest - so the one with the shortest green line on the progress indicator.

I'd try removing all running FlexSim instances using the task manager and see if that then allows you to start a new experiment without rebooting your machine. If you add the field 'commandline' to the task manager detailed process view you may be able to see which if any are child processes that have got stuck.

Please come back if you want more help with each step in this process and we'll see what we can do to guide you through it.


· 7
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Noah Z avatar image Noah Z commented ·

Thank you Jason for sharing that troubleshooting method. I'll follow this procedure today and update this thread once I have some new insights. One thing to note is that in experimenter, all the first 17 successful replications have a "full" green bar. They don't stop in the midway through their run.



0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦♦ Noah Z commented ·

If an interactively run replication runs to that simulation end time without errors popping up or printing to the console (or an outright crash), then it sounds like something could be causing a problem in the PFM definitions and/or the process that gathers results from the child instances.

0 Likes 0 ·
Noah Z avatar image Noah Z Jason Lightfoot ♦♦ commented ·

I was previously using FS 20.0.0. Per Jordan's comments I upgraded to 20.0.9 and the experimenter ran with no issues. Thanks again for your insights.

0 Likes 0 ·
Ignacio avatar image Ignacio commented ·

The same thing is happening to my model experiment.

How do you go to a specific replication and run it interactively? @jason.lightfoot

0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦♦ Ignacio commented ·

Just go to the Advanced tab of the experimenter and change the number against "Repeat Streams of Replication" - which means "set the interactive run to use the randome stream seed for this replication."

1 Like 1 ·
Ignacio avatar image Ignacio Jason Lightfoot ♦♦ commented ·

I am running 30 replications and after 8 it stops. I changed the " "Repeat Streams of Replication" to 9, 10 and 11 and they all run without issued in interactive view. What could it be?

0 Likes 0 ·
Show more comments