You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run a set of experiment with increasing ratio of deployments to nodes. I found that the simulation is getting stuck in these scenario's. First off I am not sure to what extend this is by design or because wrong combination of setting. The simulation was run using the default scheduler and ClusterContext and scheduler applied the Kubernetes settings available in Raith21. To following simple topology was for this issue.
The first scenario is an over saturated of the platform. Running a large number of unique deployments (e.g. 100 ) on the small topology all with scale_min=1 and using the following FunctionSimulator. When the startup or the setup take more than 0 this will occur. Running the simulator results in error message because of the failing deployments, as expected but the simulation stays active. Again this can be by design, then I will have to handle this on my side.
Update : I found that the HPA of kubernetes allows minimum scaling of 1, thus there are invalid parameter used for the second scenario.
The second scenario, occurs when there is one deployment with minimum scaling set to 0 and scale_zero either True or False. Running this makes the simulator get stuck after deploying. The resulting logs on DEBUG results in the following (after the mathplotlib font lines):
INFO:sim.faassim:initializing simulation, benchmark: DataIntenseBenchmark, topology nodes: 14
INFO:sim.faassim:setting up benchmark
INFO:root:alexrashed/ml-wf-1-pre, latest, [ImageProperties(name='alexrashed/ml-wf-1-pre', size=466000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=540000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=533000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-2-train, latest, [ImageProperties(name='alexrashed/ml-wf-2-train', size=519000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-2-train', size=594000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-2-train', size=550000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-3-serve, latest, [ImageProperties(name='alexrashed/ml-wf-3-serve', size=512000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=591000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=589000000, tag='latest', arch='aarch64')]
INFO:sim.faassim:starting faas system
INFO:sim.faassim:starting benchmark process
INFO:sim.faassim:executing simulation
INFO:sim.faas.system:deploying function ml-wf-1-pre_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-2-train_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-3-serve_1 with scale_min=0
The text was updated successfully, but these errors were encountered:
Hi,
I am trying to run a set of experiment with increasing ratio of deployments to nodes. I found that the simulation is getting stuck in these scenario's. First off I am not sure to what extend this is by design or because wrong combination of setting. The simulation was run using the default scheduler and ClusterContext and scheduler applied the Kubernetes settings available in Raith21. To following simple topology was for this issue.
The first scenario is an over saturated of the platform. Running a large number of unique deployments (e.g. 100 ) on the small topology all with scale_min=1 and using the following FunctionSimulator. When the startup or the setup take more than 0 this will occur. Running the simulator results in error message because of the failing deployments, as expected but the simulation stays active. Again this can be by design, then I will have to handle this on my side.
Update : I found that the HPA of kubernetes allows minimum scaling of 1, thus there are invalid parameter used for the second scenario.
The second scenario, occurs when there is one deployment with minimum scaling set to 0 and scale_zero either True or False. Running this makes the simulator get stuck after deploying. The resulting logs on DEBUG results in the following (after the mathplotlib font lines):
The text was updated successfully, but these errors were encountered: