Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for using multiple losses and metrics in evaluator #8

Closed

Conversation

BlueCrescent
Copy link
Collaborator

No description provided.

@BlueCrescent BlueCrescent self-assigned this Jan 15, 2024
tests/test_evaluator.py Outdated Show resolved Hide resolved
tests/test_evaluator.py Outdated Show resolved Hide resolved
tests/test_evaluator.py Outdated Show resolved Hide resolved
tests/test_evaluator.py Outdated Show resolved Hide resolved
src/modalities/metrics.py Outdated Show resolved Hide resolved
src/modalities/gym.py Outdated Show resolved Hide resolved
Copy link
Member

@le1nux le1nux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure of the code and the introduction of small functions. Helps reading the code a lot!

I was thinking about the next steps for the metrics and losses. They should be configurable by the config and assembled dynamically.

src/modalities/evaluator.py Outdated Show resolved Hide resolved
src/modalities/evaluator.py Outdated Show resolved Hide resolved
src/modalities/metrics.py Outdated Show resolved Hide resolved
src/modalities/gym.py Outdated Show resolved Hide resolved
@le1nux
Copy link
Member

le1nux commented Jan 31, 2024

Is this PR finalised and ready for review?

BlueCrescent and others added 9 commits February 5, 2024 08:52
done:
- integrated new aggregated measures into evaluator
- updated and expanded evaluator tests
- extracted throughput measurement into class
- fixed missing keys in measure implementations
- code clean up
todo:
- integration into trainer
- parameterization f eval losses/metrics
- testing the loss implementations (in particular perplexity)
* add aggregative measure factories to constructor of Evaluator
* extract big train method into smaller ones
* adapt test_evaluator to new params of changed Evaluator class
* gym: remove loss functions and metrics for evaluation from evaluation method
WIP: The current version does not yet work.
…xity impl and test

* add throughput aggregator factory callable; needed because otherwise it was intantiated in the class itself and was not mockable
* fix perplexity computation: now tracking losses, summing over them and then computing torch.exp(loss_sum/#samples)
*
Perplexity now gets computed correctly for each sequence in a batch and summed up afterwards.
…_losses_and_metrics_in_evaluator

# Conflicts:
#	config_files/config_example_hf_meditron_7B_instruction.yaml
#	config_files/config_example_mem_map_dataset.yaml
#	config_files/config_lorem_ipsum.yaml
#	src/modalities/__main__.py
#	src/modalities/config/config.py
#	src/modalities/config/lookup_types.py
#	src/modalities/evaluator.py
#	src/modalities/resolver_register.py
#	src/modalities/trainer.py
#	tests/conftest.py
@BlueCrescent BlueCrescent marked this pull request as draft March 4, 2024 09:35
@le1nux le1nux self-requested a review March 4, 2024 13:21
@le1nux le1nux added the enhancement New feature or request label Mar 4, 2024
@BlueCrescent BlueCrescent marked this pull request as ready for review March 4, 2024 14:50
Copy link
Member

@le1nux le1nux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good work on the evaluation/aggregation part. I left a bunch of comments (some minor and some that would need some discussion). Primarily, I'm not so much convinced about the metric and loss factories that get instantiated on a very low level. I would prefer if the instantiation would completely happen in the hierarchical instantiation and that these aggregated measures (instead of the factories) would be passed to the evaluator. I think that's the main point that necessitates discussion and I would like to hear your thoughts on it :-)

FYI, I did not take a look at the tests yet and wanted to add my comments already, as I think they would be helpful already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is outdated w.r.t. component_key and variant_key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is outdated w.r.t. component_key and variant_key

) -> torch.Tensor:
# we clone the value so that we can always resync the value without side-effects
cloned_value = self._key_to_value[key].clone()
value = Reducer.reduce(tensor=cloned_value, operation=reduce_operation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the hierarchical instantiation up and running now, we should pass in the reducer via the constructor.
We can think of different reducers e.g., torch distributed reducer, which reduces the tensors across ranks. Another reducer for single GPU training without FSDP (which is still a todo) could just call torch.mean(). What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that this makes sense. But I probably would postpone such a change until we actually have a second Reducer.

AggregativeCLMCrossEntropyLossFactory,
CLMCrossEntropyLossConfig,
),
ComponentEntity("eval_measures", "perplexity", AggregativePerplexityFactory, CLMCrossEntropyLossConfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config does not fit to the AggregativePerplexityFactory from a naming perspective.

src/modalities/registry/components.py Outdated Show resolved Hide resolved
dataloader_tag=data_loader.dataloader_tag,
)

return {loss: loss.compute() for loss in losses}, {metric: metric.compute() for metric in metrics}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it makes sense to simplify losses and metrics just as measures? Maybe we can find a better word even?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what about replacing compute with aggregate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds better. I renamed it.
I also agree, that joining the metrics and losses would be better.

global_train_sample_id: int,
local_sample_id_to_global_sample_id: Callable[[int], int],
) -> Tuple[Dict[str, torch.Tensor], Dict[str, torch.Tensor]]:
losses = [f.create(self.local_rank) for f in self._loss_factories]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of the factories here, as they seem a bit overengineered.
How about we instantiate the aggregated measures already in the hierarchical instantiation and pass the measures to the evaluator in main?
We could implement a reset function for all measures that clears out their internal state. In this case, we could get rid of the factories here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be a good idea to have access to the stateful measures outside of this context. That seems like a potential source of errors to me. Consider for example, when we evaluating multiple dataloaders using the same loss object and working in parallel.


def _extract_num_samples(data_loader: LLMDataLoader) -> int:
num_samples = len(data_loader.dataset)
if data_loader.batch_size is not None and data_loader.drop_last:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LLMDataLoader the batch_size is never None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I think the LLMDataLoader should change in that regard since the current form does not allow IterDatasets at all. So I would leave this here, to be on the save side.

self,
num_samples: int,
local_rank: int,
throughput_aggregator_factory: Callable[[], ThroughputAggregator] = ThroughputAggregator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass in the aggregator directly? The reset function is implmented already.

self._target_key = target_key
self._prediction_key = prediction_key

def create(self, local_rank: int) -> AggregativeMeasure[PerplexityKeys]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned earlier, I think the factories are overkill.

@le1nux le1nux force-pushed the main branch 3 times, most recently from cb6e816 to 179052b Compare March 13, 2024 22:14
@mali-git mali-git closed this Jun 11, 2024
@fromm-m fromm-m deleted the feat/multiple_losses_and_metrics_in_evaluator branch June 17, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants