Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation getting stuck: Scaling to zero and when minimum replica's can not be deployed. #7

Open
WSeubring opened this issue May 12, 2021 · 0 comments

Comments

@WSeubring
Copy link

WSeubring commented May 12, 2021

Hi,

I am trying to run a set of experiment with increasing ratio of deployments to nodes. I found that the simulation is getting stuck in these scenario's. First off I am not sure to what extend this is by design or because wrong combination of setting. The simulation was run using the default scheduler and ClusterContext and scheduler applied the Kubernetes settings available in Raith21. To following simple topology was for this issue.

topology

The first scenario is an over saturated of the platform. Running a large number of unique deployments (e.g. 100 ) on the small topology all with scale_min=1 and using the following FunctionSimulator. When the startup or the setup take more than 0 this will occur. Running the simulator results in error message because of the failing deployments, as expected but the simulation stays active. Again this can be by design, then I will have to handle this on my side.

  class TestSimulatorFactory(SimulatorFactory):
       def create(self, env: Environment, fn: FunctionDefinition) -> FunctionSimulator:
          queue = Resource(env=env, capacity=1)
          return TestSimulator(env, fn, queue)
  
  
  class TestSimulator(DockerDeploySimMixin):
      # Uses default deploy based on docker images    
      def __init__(self, env, fn: FunctionDefinition, queue) -> None:
          self.running: Dict[FunctionReplica, List[FunctionRequest]] = defaultdict(list)
          self.fn = fn
          self.queue = queue
          super().__init__()

    def startup(self, env: Environment, replica: FunctionReplica):
        yield env.timeout(0)

    def setup(self, env: Environment, replica: FunctionReplica):
        yield env.timeout(1)

Update : I found that the HPA of kubernetes allows minimum scaling of 1, thus there are invalid parameter used for the second scenario.

The second scenario, occurs when there is one deployment with minimum scaling set to 0 and scale_zero either True or False. Running this makes the simulator get stuck after deploying. The resulting logs on DEBUG results in the following (after the mathplotlib font lines):

INFO:sim.faassim:initializing simulation, benchmark: DataIntenseBenchmark, topology nodes: 14
INFO:sim.faassim:setting up benchmark
INFO:root:alexrashed/ml-wf-1-pre, latest, [ImageProperties(name='alexrashed/ml-wf-1-pre', size=466000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=540000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-1-pre', size=533000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-2-train, latest, [ImageProperties(name='alexrashed/ml-wf-2-train', size=519000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-2-train', size=594000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-2-train', size=550000000, tag='latest', arch='aarch64')]
INFO:root:alexrashed/ml-wf-3-serve, latest, [ImageProperties(name='alexrashed/ml-wf-3-serve', size=512000000, tag='latest', arch='arm32'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=591000000, tag='latest', arch='x86'), ImageProperties(name='alexrashed/ml-wf-3-serve', size=589000000, tag='latest', arch='aarch64')]
INFO:sim.faassim:starting faas system
INFO:sim.faassim:starting benchmark process
INFO:sim.faassim:executing simulation
INFO:sim.faas.system:deploying function ml-wf-1-pre_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-2-train_1 with scale_min=0
INFO:sim.faas.system:deploying function ml-wf-3-serve_1 with scale_min=0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant