Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

WSeubring · 2021-05-12T13:40:31Z

Hi,

I am trying to run a set of experiment with increasing ratio of deployments to nodes. I found that the simulation is getting stuck in these scenario's. First off I am not sure to what extend this is by design or because wrong combination of setting. The simulation was run using the default scheduler and ClusterContext and scheduler applied the Kubernetes settings available in Raith21. To following simple topology was for this issue.

The first scenario is an over saturated of the platform. Running a large number of unique deployments (e.g. 100 ) on the small topology all with scale_min=1 and using the following FunctionSimulator. When the startup or the setup take more than 0 this will occur. Running the simulator results in error message because of the failing deployments, as expected but the simulation stays active. Again this can be by design, then I will have to handle this on my side.

  class TestSimulatorFactory(SimulatorFactory):
       def create(self, env: Environment, fn: FunctionDefinition) -> FunctionSimulator:
          queue = Resource(env=env, capacity=1)
          return TestSimulator(env, fn, queue)
  
  
  class TestSimulator(DockerDeploySimMixin):
      # Uses default deploy based on docker images    
      def __init__(self, env, fn: FunctionDefinition, queue) -> None:
          self.running: Dict[FunctionReplica, List[FunctionRequest]] = defaultdict(list)
          self.fn = fn
          self.queue = queue
          super().__init__()

    def startup(self, env: Environment, replica: FunctionReplica):
        yield env.timeout(0)

    def setup(self, env: Environment, replica: FunctionReplica):
        yield env.timeout(1)

Update : I found that the HPA of kubernetes allows minimum scaling of 1, thus there are invalid parameter used for the second scenario.

The second scenario, occurs when there is one deployment with minimum scaling set to 0 and scale_zero either True or False. Running this makes the simulator get stuck after deploying. The resulting logs on DEBUG results in the following (after the mathplotlib font lines):

INFO:sim.faassim:initializing simulation, benchmark: DataIntenseBenchmark, topology nodes: 14
INFO:sim.faassim:setting up benchmark
INFO:root:alexrashed/ml-wf-1-pre, latest, [ImageProperties(name='alexrashed/ml-wf-1-pre', size=466000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=540000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=533000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-2-train, latest, [ImageProperties(name='alexrashed/ml-wf-2-train', size=519000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-2-train', size=594000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-2-train', size=550000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-3-serve, latest, [ImageProperties(name='alexrashed/ml-wf-3-serve', size=512000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=591000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=589000000, tag='latest', arch='aarch64')]
INFO:sim.faassim:starting faas system
INFO:sim.faassim:starting benchmark process
INFO:sim.faassim:executing simulation
INFO:sim.faas.system:deploying function ml-wf-1-pre_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-2-train_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-3-serve_1 with scale_min=0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

WSeubring commented May 12, 2021 •

edited

Loading

Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

Comments

WSeubring commented May 12, 2021 • edited Loading

Update : I found that the HPA of kubernetes allows minimum scaling of 1, thus there are invalid parameter used for the second scenario.

WSeubring commented May 12, 2021 •

edited

Loading