Remove unnecessary `datasets` field in methods database entries? #355

yousefmoazzam · 2024-05-31T11:52:24Z

At a glance, it appears as if the datasets field is not used anywhere in max slices calculations.

Specifically, it's not used in the task runner's max slices calculations:

Lines 291 to 341 in 4429711

    
           def determine_max_slices(self, section: Section, slicing_dim: int): 
        
               assert self.source is not None 
        
               data_shape = self.source.chunk_shape 
        
               max_slices = data_shape[slicing_dim] 
        
               if len(section) == 0: 
        
                   section.max_slices = min(httomo.globals.MAX_CPU_SLICES, max_slices) 
        
                   return 
        
               nsl_dim_l = list(data_shape) 
        
               nsl_dim_l.pop(slicing_dim) 
        
               non_slice_dims_shape = (nsl_dim_l[0], nsl_dim_l[1]) 
        
               available_memory = get_available_gpu_memory(10.0) 
        
               available_memory_in_GB = round(available_memory / (1024**3), 2) 
        
               memory_str = ( 
        
                   f"The amount of the available GPU memory is {available_memory_in_GB} GB" 
        
               ) 
        
               log_rank(memory_str, comm=self.comm) 
        
               if self._memory_limit_bytes != 0: 
        
                   available_memory = min(available_memory, self._memory_limit_bytes) 
        
                   log_once( 
        
                       f"The memory has been limited to {available_memory / (1024**3):4.2f} GB", 
        
                       level=logging.DEBUG, 
        
                   ) 
        
               max_slices_methods = [max_slices] * len(section) 
        
               # loop over all methods in section 
        
               has_gpu = False 
        
               for idx, m in enumerate(section): 
        
                   if len(m.memory_gpu) == 0: 
        
                       max_slices_methods[idx] = max_slices 
        
                       continue 
        
                   has_gpu = has_gpu or m.is_gpu 
        
                   output_dims = m.calculate_output_dims(non_slice_dims_shape) 
        
                   (slices_estimated, available_memory) = m.calculate_max_slices( 
        
                       self.source.dtype, 
        
                       non_slice_dims_shape, 
        
                       available_memory, 
        
                   ) 
        
                   max_slices_methods[idx] = min(max_slices, slices_estimated) 
        
                   non_slice_dims_shape = output_dims 
        
               if not has_gpu: 
        
                   section.max_slices = min( 
        
                       min(max_slices_methods), httomo.globals.MAX_CPU_SLICES 
        
                   ) 
        
               else: 
        
                   section.max_slices = min(max_slices_methods)

nor in the generic method wrapper's implementation of calculate_max_slices() (which all method wrappers inherit from):

httomo/httomo/method_wrappers/generic.py

Lines 375 to 430 in 4429711

    
               def calculate_max_slices( 
        
                   self, 
        
                   data_dtype: np.dtype, 
        
                   non_slice_dims_shape: Tuple[int, int], 
        
                   available_memory: int, 
        
               ) -> Tuple[int, int]: 
        
                   """If it runs on GPU, determine the maximum number of slices that can fit in the 
        
                   available memory in bytes, and return a tuple of 
        
                   (max_slices, available_memory) 
        
                   The available memory may have been adjusted for the methods that follow, in case 
        
                   something persists afterwards. 
        
                   """ 
        
                   if self.is_cpu or not gpu_enabled: 
        
                       return int(100e9), available_memory 
        
                   # if we have no information, we assume in-place operation with no extra memory 
        
                   if len(self.memory_gpu) == 0: 
        
                       return ( 
        
                           int( 
        
                               available_memory 
        
                               // (np.prod(non_slice_dims_shape) * data_dtype.itemsize) 
        
                           ), 
        
                           available_memory, 
        
                       ) 
        
                   # NOTE: This could go directly into the methodquery / method database, 
        
                   # and here we just call calculated_memory_bytes 
        
                   memory_bytes_method = 0 
        
                   for field in self.memory_gpu: 
        
                       subtract_bytes = 0 
        
                       # loop over the dataset names given in the library file and extracting 
        
                       # the corresponding dimensions from the available datasets 
        
                       if field.method == "direct": 
        
                           assert field.multiplier is not None 
        
                           # this calculation assumes a direct (simple) correspondence through multiplier 
        
                           memory_bytes_method += int( 
        
                               field.multiplier 
        
                               * np.prod(non_slice_dims_shape) 
        
                               * data_dtype.itemsize 
        
                           ) 
        
                       else: 
        
                           ( 
        
                               memory_bytes_method, 
        
                               subtract_bytes, 
        
                           ) = self._query.calculate_memory_bytes( 
        
                               non_slice_dims_shape, data_dtype, **self.config_params 
        
                           ) 
        
                   if memory_bytes_method == 0: 
        
                       return available_memory - subtract_bytes, available_memory 
        
                   return ( 
        
                       available_memory - subtract_bytes 
        
                   ) // memory_bytes_method, available_memory

I was wondering if the datasets field is an old remnant from when input datasets in the framework were named, and is thus no longer needed?

The text was updated successfully, but these errors were encountered:

yousefmoazzam · 2024-05-31T16:01:58Z

More investigation is needed into whether it's needed or not, it seems that the query object for getting method info from the methods database YAML files references the datasets field:

httomo/httomo/methods_database/query.py

Lines 102 to 124 in 4429711

    
           def get_memory_gpu_params( 
        
               self, 
        
           ) -> List[GpuMemoryRequirement]: 
        
               p = get_method_info(self.module_path, self.method_name, "memory_gpu") 
        
               if p is None or p == "None": 
        
                   return [] 
        
               if type(p) == list: 
        
                   # convert to dict first 
        
                   dd: dict = dict() 
        
                   for item in p: 
        
                       dd |= item 
        
               else: 
        
                   dd = p 
        
               # now iterate and make it into one 
        
               assert ( 
        
                   len(dd["datasets"]) == len(dd["multipliers"]) == len(dd["methods"]) 
        
               ), "Invalid data" 
        
               return [ 
        
                   GpuMemoryRequirement( 
        
                       dataset=d, multiplier=dd["multipliers"][i], method=dd["methods"][i] 
        
                   ) 
        
                   for i, d in enumerate(dd["datasets"]) 
        
               ]

yousefmoazzam added question Further information is requested minor Nice to do but not vital labels May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary `datasets` field in methods database entries? #355

Remove unnecessary `datasets` field in methods database entries? #355

yousefmoazzam commented May 31, 2024

yousefmoazzam commented May 31, 2024

Remove unnecessary datasets field in methods database entries? #355

Remove unnecessary datasets field in methods database entries? #355

Comments

yousefmoazzam commented May 31, 2024

yousefmoazzam commented May 31, 2024

Remove unnecessary `datasets` field in methods database entries? #355

Remove unnecessary `datasets` field in methods database entries? #355