Improve cli startup speed. #44

AdamPalmarUnity · 2020-08-17T08:33:42Z

/kind feature

Why you need this feature:
Running the cli is slow

time datasetinsights train -h
Usage: datasetinsights train [OPTIONS]

  Start model training (and optionally validation) tasks.

Options:
  -c, --config TEXT           Path to the config estimator yaml file.
                              [required]

  -t, --train-data DIRECTORY  Directory on localhost where train dataset is
                              located.  [required]

  -e, --val-data DIRECTORY    Directory on localhost where validation dataset
                              is located.

  -p, --checkpoint-file TEXT  URI to a checkpoint file. If specified, model
                              will load from this checkpoint and resume
                              training.

  -l, --tb-log-dir TEXT       Path to the directory where tensorboard events
                              should be stored. This Path can be GCS URI (e.g.
                              gs://<bucket>/runs) or full path to a local
                              directory.  [default:
                              /home/adamp/Documents/dataset-
                              insights/runs/20200817-102417]

  -p, --checkpoint-dir TEXT   Path to the directory where model checkpoint
                              files should be stored. This Path can be GCS URI
                              (e.g. gs://<bucket>/checkpoints) or full path to
                              a local directory.  [default:
                              /home/adamp/Documents/dataset-
                              insights/checkpoints/20200817-102417]

  -w, --workers INTEGER       Number of multiprocessing workers for loading
                              datasets. Set this argument to 0 will disable
                              multiprocessing which is recommended when
                              running inside a docker container.  [default: 0]

  --no-cuda                   Force to disable CUDA. If CUDA is available and
                              this flag is False, model will be trained using
                              CUDA.  [default: False]

  --no-val                    Force to disable validations.  [default: False]
  -h, --help                  Show this message and exit.  [default: False]

real	0m10.256s
user	0m5.709s
sys	0m1.134s

This is due to the large amount of imports that happen when the CLI is called.

Import times on datasetinsights/commands/download.py

Import time on datasetinsights/commands/train.py

Describe the solution you'd like:
[A clear and concise description of what you want to happen.]
When the cli is called the command should start within 0.5 seconds to improve the user experience.

Proposed solution
Use LazyLoading utilizing importlib. This way the import happens when the cli command is executed and not at parse time. Tensorflow has an implementation of the LazyLoader

import importlib
import types


class LazyLoader(types.ModuleType):

    def __init__(self, local_name, parent_module_globals, name):
        self._local_name = local_name
        self._parent_module_globals = parent_module_globals

        super(LazyLoader, self).__init__(name)

    def _load(self):
        # Import the target module and insert it into the parent's namespace
        module = importlib.import_module(self.__name__)
        self._parent_module_globals[self._local_name] = module

        # Update this object's dict so that if someone keeps a reference to the
        #   LazyLoader, lookups are efficient (__getattr__ is only called on lookups
        #   that fail).
        self.__dict__.update(module.__dict__)

        return module

    def __getattr__(self, item):
        module = self._load()
        return getattr(module, item)

    def __dir__(self):
        module = self._load()
        return dir(module)

Used the following way

from datasetinsights.lazy_loader import LazyLoader

estimators = LazyLoader("estimators", globals(), "datasetinsights.estimators")

...
cli():
    estimators.create_estimator()

The text was updated successfully, but these errors were encountered:

AdamPalmarUnity added the enhancement New feature or request label Aug 17, 2020

adason added this to the 0.3 milestone Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve cli startup speed. #44

Improve cli startup speed. #44

AdamPalmarUnity commented Aug 17, 2020 •

edited

Loading

Improve cli startup speed. #44

Improve cli startup speed. #44

Comments

AdamPalmarUnity commented Aug 17, 2020 • edited Loading

AdamPalmarUnity commented Aug 17, 2020 •

edited

Loading