You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 19, 2024. It is now read-only.
Why you need this feature:
Running the cli is slow
time datasetinsights train -h
Usage: datasetinsights train [OPTIONS]
Start model training (and optionally validation) tasks.
Options:
-c, --config TEXT Path to the config estimator yaml file.
[required]
-t, --train-data DIRECTORY Directory on localhost where train dataset is
located. [required]
-e, --val-data DIRECTORY Directory on localhost where validation dataset
is located.
-p, --checkpoint-file TEXT URI to a checkpoint file. If specified, model
will load from this checkpoint and resume
training.
-l, --tb-log-dir TEXT Path to the directory where tensorboard events
should be stored. This Path can be GCS URI (e.g.
gs://<bucket>/runs) or full path to a local
directory. [default:
/home/adamp/Documents/dataset-
insights/runs/20200817-102417]
-p, --checkpoint-dir TEXT Path to the directory where model checkpoint
files should be stored. This Path can be GCS URI
(e.g. gs://<bucket>/checkpoints) or full path to
a local directory. [default:
/home/adamp/Documents/dataset-
insights/checkpoints/20200817-102417]
-w, --workers INTEGER Number of multiprocessing workers for loading
datasets. Set this argument to 0 will disable
multiprocessing which is recommended when
running inside a docker container. [default: 0]
--no-cuda Force to disable CUDA. If CUDA is available and
this flag is False, model will be trained using
CUDA. [default: False]
--no-val Force to disable validations. [default: False]
-h, --help Show this message and exit. [default: False]
real 0m10.256s
user 0m5.709s
sys 0m1.134s
This is due to the large amount of imports that happen when the CLI is called.
Import times on datasetinsights/commands/download.py
Import time on datasetinsights/commands/train.py
Describe the solution you'd like:
[A clear and concise description of what you want to happen.]
When the cli is called the command should start within 0.5 seconds to improve the user experience.
Proposed solution
Use LazyLoading utilizing importlib. This way the import happens when the cli command is executed and not at parse time. Tensorflow has an implementation of the LazyLoader
import importlib
import types
class LazyLoader(types.ModuleType):
def __init__(self, local_name, parent_module_globals, name):
self._local_name = local_name
self._parent_module_globals = parent_module_globals
super(LazyLoader, self).__init__(name)
def _load(self):
# Import the target module and insert it into the parent's namespace
module = importlib.import_module(self.__name__)
self._parent_module_globals[self._local_name] = module
# Update this object's dict so that if someone keeps a reference to the
# LazyLoader, lookups are efficient (__getattr__ is only called on lookups
# that fail).
self.__dict__.update(module.__dict__)
return module
def __getattr__(self, item):
module = self._load()
return getattr(module, item)
def __dir__(self):
module = self._load()
return dir(module)
/kind feature
Why you need this feature:
Running the cli is slow
This is due to the large amount of imports that happen when the CLI is called.
Import times on datasetinsights/commands/download.py
Import time on datasetinsights/commands/train.py
Describe the solution you'd like:
[A clear and concise description of what you want to happen.]
When the cli is called the command should start within 0.5 seconds to improve the user experience.
Proposed solution
Use LazyLoading utilizing importlib. This way the import happens when the cli command is executed and not at parse time. Tensorflow has an implementation of the LazyLoader
Used the following way
The text was updated successfully, but these errors were encountered: