This article borrows from @jordan.johnson's original guide for deploying a distributed experiment/optimization with Amazon.
Distributed Terminology
- Please be familiar with the terminology related to experiments and optimizations.
- Replication - a distinct model run. We'll use this term below for both replications (experimenter) and solutions (optimizer).
- Instance - a system, whether virtual or bare-metal, local or remote, meeting FlexSim's recommended requirements, and used for running distributed replications.
- Main PC - the system from which the user configures and initiates the distributed experiment or optimization.
- The Cloud - a shorthand term for remote servers/systems hosted in a data center, sometimes by a 3rd party such as Amazon, Microsoft, Google, or others.
Background Concepts
In discussing distributed experiments or optimization, you should start with a good understanding of both the Experimenter and the Optimizer. Please read and understand this user manual entry which includes explanations of key concepts that will be important as you continue.
Other user manual articles provide additional guidance on configuring your own experiments or optimizations. Search the online user manual and this community if you have additional questions regarding experiments or optimizations, as needed.
Using the experimenter requires a FlexSim license. The optimizer requires a license for the OptQuest add-on. Please contact your local distributor for more information or to request a test license for FlexSim and/or OptQuest.
What are distributed experiments or optimizations?
Experiments or optimizations can create dozens, hundreds, or even thousands of distinct model runs (or replications) to test various scenarios, build confidence intervals of the results, or zero-in on an optimal solution.
A distributed experiment or optimization takes those individual model runs and assigns them (distributes them) to run across a group of computers. If you needed to run 1000 replications, and you have 4 computers available, each computer could run 250 replications, getting your results 4 times faster.
When should you distribute?
Using distributed replications can significantly reduce the required time to run an experiment if:
- The single-run time for your model is high (a couple minutes or more)
- You anticipate running a high number of replications
If the time per replication is short, then the increased communication overhead may outweigh the benefit of using distributed replications. The communication overhead increases because all distributed replications still report results to a single FlexSim process on the Main PC, and that communication occurs over the network (local instances) or internet (remote instances), rather than on a single computer. If an experiment or optimization completes in an acceptable amount of time, you may not need to use distributed replications.
Financial Costs
3rd party cloud providers such as Amazon, Microsoft, Google, and others, charge for their services. Your cost is usually based on the hardware you provision for your Windows instances, and the length of time those instances are live.
Typically you only run your instances when running your experiment or optimization, and cancel or decommission your instances when your replications are complete. In this way you minimize the cost of your distributed experiment or optimization.
The costs and process of decommissioning your provisioned instances differ based on which 3rd party cloud provider you use.
You may have access to local computing resources that could be configured for use in your FlexSim experiments or optimizations. In this case you may avoid additional costs relating to 3rd party cloud providers by using your own on-premises computers.
Licensing for distributed Windows instances
Windows instances used solely for distributed replications DO NOT need an individual FlexSim license. Only the FlexSim installation on the Main PC must be licensed.
System requirements for distributed Windows instances
Graphics
A Windows instance should meet or exceed FlexSim's recommended requirements. However, because replications run as background processes without graphics, they need not meet the graphics requirements and do not need hardware accelerated graphics.
CPU
An instance can run as many concurrent replications as it has CPU cores. If, for example, an instance has 32 CPU cores, it can run 32 replications of your model simultaneously.
RAM
For this example of a 32-CPU Windows instance, if you want FlexSim to run 32 replications simultaneously - one on each core - then the instance must also have enough RAM to handle 32 concurrent model runs. On your Main PC, use Windows Task Manager to watch a single model run's RAM usage to determine its peak RAM utilization. If your model's RAM utilization peaks at 3GB throughout the course of a model run, you should make sure your 32-CPU system has at least 32*3 = 96GB of RAM for replications alone, along with a good ~10% more for additional overhead used for the Webserver and the statistics gathering and reporting from all the replications (this number could be more or less, depending on the model's stats gathering). In addition, you must account for your system's baseline RAM utilization - how much RAM it uses just to run the operating system and all its background processes. You can find this baseline by checking the Task Manager at a moment when you're not running any simulations. Add all these together to see how much RAM would be necessary to run a replication on each core of the instance.
If your instance doesn't have enough RAM to handle that many simultaneous replications, you'll need to limit the number of CPUs FlexSim will use when configuring your Cloud Computing settings from the Main PC.
Disk
Disk space is usually not an issue. However, if you are using the Store Data on Hard Drive option in the Statistics Collector, you will need to be sure that there is enough disk space to run the model to completion on the hard drive, multiplied by the number of cores. The amount of disk space on each instance may affect the total cost of using a 3rd party cloud provider.
More information
See the related sections in the article Recommended System Requirements for a more in-depth discussion of system components such as CPU and RAM, and how they relate to running your simulations and experiments.
Provisioning distributed Windows instances
The process of provisioning 3rd party cloud instances for running distributed FlexSim replications will differ from provider to provider. FlexSim has guides for the following cloud providers:
Wherever you decide to host your instances, the important part is that eventually you end up with a group of Windows instances, each with a unique IPv4 address accessible from the Main PC.
A custom Windows image
Your 3rd party cloud provider probably allows you to save a custom Windows image that can include software and settings that you need for each Windows instance. Doing this once for a custom Windows image saves you time when starting new instances - each instance will start with the software and settings preconfigured for your purpose.
On your custom image, do the following
- Download and install FlexSim. Use the same version of FlexSim as is licensed on the Main PC. Remember, you DO NOT need to activate a license.
- Run FlexSim. This creates a directory that is needed later. After FlexSim finishes its first start, close FlexSim.
- Download and install the FlexSim Webserver for your installed version of FlexSim.
- Edit the Webserver's configuration file appropriately for your situation. Default location is here: "C:\Program Files (x86)\FlexSim Web Server\flexsim webserver configuration.txt"
- Start the FlexSim Webserver from your Windows Start menu, or manually from the default location here: "C:\Program Files (x86)\FlexSim Web Server\flexsimserver.bat". The Webserver will download necessary files the first time it is run. Close the Webserver after it has completed its initial startup.
- Allow both FlexSim and node.js through the Windows Firewall. To do so, use Windows' Allow an App through the Windows Firewall tool. You will need to browse for both FlexSim and Node.js. Example locations (exact locations may vary by version):
- C:\Program Files\FlexSim 2023\program\flexsim.exe
- C:\Program Files\nodejs\node.exe
If you are not familiar with the FlexSim Webserver, please review its documentation and test it out on a local machine to understand what it does, its configuration options, etc.
If you cannot configure a custom Windows image, you will need to do the above on each instance individually.
Initialize instances
Launch your instances. On each instance, do the following:
- Log in to the instance with Remote Desktop or some other solution.
- Start the FlexSim Webserver from your Windows Start menu, or manually from the default location here: "C:\Program Files (x86)\FlexSim Web Server\flexsimserver.bat".
- Note the instance's IPv4 address.
Once all instances are running the Webserver, you are ready to configure a distributed experiment or optimization from the Main PC.
Author's Note: There is probably a way to make it so that when instances start up, they automatically run the Webserver, so that you don't have to manually connect to each one. I welcome any suggestions or steps for how to make that happen.
Configuring a distributed experiment or optimization
Once you have a list of running instances available, launch FlexSim on the Main PC.
Open the experimenter interface from FlexSim's Main Menu > Statistics > Experimenter.
Configure your experiment. As part of your configuration, under the Advanced tab, select the option to Use Distributed CPUs. Press the button to Configure Cloud Nodes. This will open the Global Preferences' Environment tab.
Enter the IP addresses of your instances into FlexSim. Enter the port numbers you configured in their FlexSim Webservers. If your instances will max out RAM before CPUs, enter a CPU count representing the maximum number of concurrent replications your instance can handle.
For more information on using remote computers for Experiment Jobs, see Running Jobs on the Cloud for more information.