question

Aurélien FR avatar image
0 Likes"
Aurélien FR asked Aurélien FR commented

Parallel processing on GPUs

Hello,

We often use the very useful new feature of distributed CPU with Amazon web services, thank you for this, it really helps.

We are considering to buy new hardware to try and limit the costs with Amazon services. We are wondering if it would be possible to run Flexsim on GPUs (instead of distributed CPUs) in order to boost parallel processing. Is it something that might become possible in the future ?

If not, do you have multi-threads processors to recommand, we are considering AMD Ryzen™ Threadripper™ 1950X with 32 threads.

Thank you very much for your answers.

Aurélien

FlexSim 19.0.0
parallel processing
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

1 Answer

·
Jordan Johnson avatar image
2 Likes"
Jordan Johnson answered Aurélien FR commented

Whether you are running the experiment on Amazon, or buying your own hardware, you should make sure that each individual core is powerful. While having more processors at once is helpful, the amount of performance improvement will depend on both the number of cores and the speed of each core. Using twice as many cores that are half as fast will not help you.

I don't know a lot about hardware, so I can't guide your buying decisions. I would look at your task: how many replications do you need to run, and how long does each replication take? Then look at benchmarks like this one:

https://www.youtube.com/watch?v=Fr1ZlUu8v_Q

These can help answer the question "should I buy n of these cores or m of those cores?"

As far as using the graphics card to boost FlexSim's performance, see Anthony's response to this post on the old forum:

https://archive.flexsim.com/showthread.php?t=612

· 4
5 |100000

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Aurélien FR avatar image Aurélien FR commented ·

Thank you Jordan for your answer. I was not clear enough with my initial question. I am only talking about using the Experimenter and running as many replications and scenarios in parallel as possible.

Currently, the number of CPU threads gives the maximum number of parallel scenario/replications (8 for a 4 cores/8 threads processors).

Is it possible (or is it planned in future development) to run the Experimenter on the (hundreds or thousands) threads of the GPU, as people in machine learning programming use to do with CUDA for example, cf. below links.

https://en.wikipedia.org/wiki/CUDA

https://www.geforce.com/hardware/technology/cuda/technology

Thank you very much

0 Likes 0 ·
Jason Lightfoot avatar image Jason Lightfoot ♦ Aurélien FR commented ·

As a rule of thumb just use the number of virtual cores (threads) * (mutlipliedby) clock speed to compare processors - the results units being "threadGhz" or "vcoreGhz". You need to use the base clock speed for this not the turbo as when all cores are used it will throttle back to the base clock. There will be some adjustment due to hyperthreading inefficiency and memory architecture, but it will get you in the right ballpark.

For two dissimilar configurations with a similar vcoreGHz score, the disadvantage of high core count and lower clock speed would that you need more memory, and if your models are large that will get pretty substantial - a 3GB model and 64 threads will need 192GB of RAM plus system memory.

Another quick point is that AMD and MS are fixing now an issue with NUMA memory and core affinity in Windows which is hampering massively parallel processes (like FlexSim experiments) on Threadrippers from acheiving their full potential. I'd check the tech news about that.

1 Like 1 ·
Jordan Johnson avatar image Jordan Johnson ♦♦ Aurélien FR commented ·

FlexSim is written to run on the CPU. CUDA doesn't allow CPU programs to just run on the GPU. You can write a CUDA C++ application to run on a GPU, but FlexSim is not a CUDA application. You can write a module using the Module SDK that executes CUDA code.

1 Like 1 ·
Aurélien FR avatar image Aurélien FR commented ·

Thank you very much Jason and Jordan !

0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.