You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use-case we've come across is if a static-partitioned asset has some partitions populated by one upstream dependency, and other partitions populated by another upstream dependency. Currently, if one of the upstream assets doesn't have one of the downstream partitions, then the downstream asset cannot be materialised. This leads to loading unnecessary data through AllPartitionMapping with potentially very high redundant network costs, or through 'hacky' solutions loading asset values from the defs object, unless I am missing an easier solution?
@cached_methoddef_check_upstream(self, *, upstream_partitions_def: StaticPartitionsDefinition):
"""Validate that the mapping from upstream to downstream is only defined on upstream keys."""check.inst_param(
upstream_partitions_def,
"upstream_partitions_def",
StaticPartitionsDefinition,
"StaticPartitionMapping can only be defined between two StaticPartitionsDefinitions",
)
ifself.allow_nonexistent_upstream_partitions:
# If allowed to have nonexistent upstream partitions, do not consider# out of range partitions to be invalidreturnupstream_keys=upstream_partitions_def.get_partition_keys()
extra_keys=set(self._mapping.keys()).difference(upstream_keys)
ifextra_keys:
raiseValueError(
f"mapping source partitions not in the upstream partitions definition: {extra_keys}"
)
Additional information
Thanks all :)
I am happy to draft a PR if this suggestion works for you?
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered:
What's the use case?
Use-case we've come across is if a static-partitioned asset has some partitions populated by one upstream dependency, and other partitions populated by another upstream dependency. Currently, if one of the upstream assets doesn't have one of the downstream partitions, then the downstream asset cannot be materialised. This leads to loading unnecessary data through
AllPartitionMapping
with potentially very high redundant network costs, or through 'hacky' solutions loading asset values from thedefs
object, unless I am missing an easier solution?e.g.
Ideas of implementation
TimeWindowMapping
has this functionality, which is very useful. I have borrowed its implementation here:dagster/python_modules/dagster/dagster/_core/definitions/time_window_partition_mapping.py
Line 350 in 8ece8ba
Additional information
Thanks all :)
I am happy to draft a PR if this suggestion works for you?
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: