Skip to content
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.

Improve cli startup speed. #44

Open
AdamPalmarUnity opened this issue Aug 17, 2020 · 0 comments
Open

Improve cli startup speed. #44

AdamPalmarUnity opened this issue Aug 17, 2020 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@AdamPalmarUnity
Copy link
Contributor

AdamPalmarUnity commented Aug 17, 2020

/kind feature

Why you need this feature:
Running the cli is slow

time datasetinsights train -h
Usage: datasetinsights train [OPTIONS]

  Start model training (and optionally validation) tasks.

Options:
  -c, --config TEXT           Path to the config estimator yaml file.
                              [required]

  -t, --train-data DIRECTORY  Directory on localhost where train dataset is
                              located.  [required]

  -e, --val-data DIRECTORY    Directory on localhost where validation dataset
                              is located.

  -p, --checkpoint-file TEXT  URI to a checkpoint file. If specified, model
                              will load from this checkpoint and resume
                              training.

  -l, --tb-log-dir TEXT       Path to the directory where tensorboard events
                              should be stored. This Path can be GCS URI (e.g.
                              gs://<bucket>/runs) or full path to a local
                              directory.  [default:
                              /home/adamp/Documents/dataset-
                              insights/runs/20200817-102417]

  -p, --checkpoint-dir TEXT   Path to the directory where model checkpoint
                              files should be stored. This Path can be GCS URI
                              (e.g. gs://<bucket>/checkpoints) or full path to
                              a local directory.  [default:
                              /home/adamp/Documents/dataset-
                              insights/checkpoints/20200817-102417]

  -w, --workers INTEGER       Number of multiprocessing workers for loading
                              datasets. Set this argument to 0 will disable
                              multiprocessing which is recommended when
                              running inside a docker container.  [default: 0]

  --no-cuda                   Force to disable CUDA. If CUDA is available and
                              this flag is False, model will be trained using
                              CUDA.  [default: False]

  --no-val                    Force to disable validations.  [default: False]
  -h, --help                  Show this message and exit.  [default: False]

real	0m10.256s
user	0m5.709s
sys	0m1.134s

This is due to the large amount of imports that happen when the CLI is called.

Import times on datasetinsights/commands/download.py

Screenshot from 2020-08-17 10-30-33

Import time on datasetinsights/commands/train.py

image

Describe the solution you'd like:
[A clear and concise description of what you want to happen.]
When the cli is called the command should start within 0.5 seconds to improve the user experience.

Proposed solution
Use LazyLoading utilizing importlib. This way the import happens when the cli command is executed and not at parse time. Tensorflow has an implementation of the LazyLoader

import importlib
import types


class LazyLoader(types.ModuleType):

    def __init__(self, local_name, parent_module_globals, name):
        self._local_name = local_name
        self._parent_module_globals = parent_module_globals

        super(LazyLoader, self).__init__(name)

    def _load(self):
        # Import the target module and insert it into the parent's namespace
        module = importlib.import_module(self.__name__)
        self._parent_module_globals[self._local_name] = module

        # Update this object's dict so that if someone keeps a reference to the
        #   LazyLoader, lookups are efficient (__getattr__ is only called on lookups
        #   that fail).
        self.__dict__.update(module.__dict__)

        return module

    def __getattr__(self, item):
        module = self._load()
        return getattr(module, item)

    def __dir__(self):
        module = self._load()
        return dir(module)

Used the following way

from datasetinsights.lazy_loader import LazyLoader

estimators = LazyLoader("estimators", globals(), "datasetinsights.estimators")

...
cli():
    estimators.create_estimator()

@AdamPalmarUnity AdamPalmarUnity added the enhancement New feature or request label Aug 17, 2020
@adason adason added this to the 0.3 milestone Aug 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants