-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fdb 332 partial date #64
base: feature/rule-schema-key
Are you sure you want to change the base?
Conversation
…rtialDate added year and month aliases and extended tests
queue_.emplace(currentCatalogue_->key(), currentIndex_->key(), datumKey, field.stableLocation(), | ||
field.timestamp()); | ||
// Take into account any rule-specific behaviour in the request | ||
auto canonical = rule_->registry().canonicalise(request_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to canonicalize the request once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the aim of this PR is associating different types to the same metadata in different level of the schema.
This gives us the possibility to use only a subset of the date (the year or the month) at top level, and the full info at an inner level.
For this reason, the user request has to be re-interpreted (and re-canonicalised) at each level, with the types associated with that specific level
for (auto it = nodes_.begin(); it != nodes_.end(); it++) { | ||
const auto& type = registry.lookupType(it->keyword_); | ||
for (auto& value : it->values_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with the for each that was used before?
@@ -274,6 +277,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request | |||
auto& node = graph.push(keyword); | |||
|
|||
for (const auto& value : values) { | |||
// std::string value = type.toKey(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented out code.
@@ -304,6 +309,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request | |||
auto& node = graph.push(keyword); | |||
|
|||
for (const auto& value : values) { | |||
//std::string value = type.toKey(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented out code.
LOG_DEBUG_LIB(LibFdb5) << "findMatchingKeys " << request << " ==> "; | ||
std::string sep; | ||
for (const auto& k: out) { | ||
LOG_DEBUG_LIB(LibFdb5) << sep << k; | ||
sep = " | "; | ||
} | ||
LOG_DEBUG_LIB(LibFdb5) << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this into a if(debug){}
block? Otherwise we are having to pay the cost of the debug message even if it is never displayed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implements many good fixes.
I left non PR comments around canonicalization, which we may attempt to fix in a separate PR
bool visitIndex(const Index&) override { NOTIMP; } | ||
|
||
void visitDatum(const Field& /*field*/, const Key& /*datumKey*/) override { NOTIMP; } | ||
void visitDatum(const Field&, const Key&) override { NOTIMP; } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be reverted, it slipped obviously
// If the DB is locked for listing, then it "doesn't exist" | ||
if (!catalogue.enabled(ControlIdentifier::List)) { | ||
return false; | ||
} | ||
|
||
bool ret = QueryVisitor::visitDatabase(catalogue); | ||
|
||
ASSERT(currentCatalogue_->key().partialMatch(request_)); | ||
auto dbRequest = catalogue.rule().registry().canonicalise(request_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -96,15 +108,17 @@ struct ListVisitor : public QueryVisitor<ListElement> { | |||
bool visitIndex(const Index& index) override { | |||
QueryVisitor::visitIndex(index); | |||
|
|||
if (index.partialMatch(request_)) { | |||
if (index.partialMatch(*rule_, request_)) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would indexRequest_
be better here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danovaro, there are big changes;
in hopes to avoid repeated canonicalise() per datum
I had canonicalized the datum request at visitIndex (rather than at visitDatum)
apperantly, there were issues, which meant we now canonicalize requests:
2x in index.partialMatch()
per index and later per datum
I'd like to come back to this in another PR, and see if we can trim the fat for partial match and request topics
@@ -68,6 +69,7 @@ class DaosCatalogue : public CatalogueImpl, public DaosCommon { | |||
private: // members | |||
|
|||
Schema schema_; | |||
const RuleDatabase* rule_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rule_ {nullptr};
std::set<Key> keys; | ||
config_.schema().matchDatabase(request, keys, ""); | ||
config_.schema().matchDatabase(request, results, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate result of canonicalization
other option would be to leave this alone and do sth like db.schema->matchingRule(key)
when needed
possibly unneeded
for (auto& [keyword, values] : nodes_) { | ||
const auto& type = registry.lookupType(keyword); | ||
for (auto& value : values) { | ||
for (auto it = nodes_.begin(); it != nodes_.end(); it++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha, my pretty structured binding range-based loop ? :)
I'll fix
|
||
if (!axes_.partialMatch(request)) { return false; } | ||
canonical = rule.registry().canonicalise(request); | ||
if (!axes_.partialMatch(canonical)) { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could simply do return axes_.partialMatch(canonical);
|
||
if (!key_.partialMatch(request)) { return false; } | ||
auto canonical = rule.parent().registry().canonicalise(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is parent registry not same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?
@@ -156,15 +156,5 @@ diff out checkv2.again | |||
|
|||
cmp 6.grib checkV2.again.grib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also going to ask: why was this removed?
@@ -102,6 +106,8 @@ void DaosCatalogue::loadSchema() { | |||
std::istringstream stream{std::string(v.begin(), v.end())}; | |||
schema_.load(stream); | |||
|
|||
rule_ = &schema_.matchingRule(dbKey_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make rule_ also a reference instead of a pointer?
This would show the non-owning properties. matchingRule(dbKey)
is returning a reference either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is set during loading... matchingRule is returning a reference to an object handled by a unique_ptr. So I guess it's fine?
@@ -62,7 +62,7 @@ bool EntryVisitor::visitDatabase(const Catalogue& catalogue) { | |||
currentCatalogue_ = &catalogue; | |||
currentStore_ = nullptr; | |||
currentIndex_ = nullptr; | |||
rule_ = nullptr; | |||
rule_ = ¤tCatalogue_->rule(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this rule_
element really needed? Wouldn't it be nice to replace all occurrences with the corresponding catalogue->rule() call? There are only 4 locations where this is called, if I'm not mistaken. Regarding the other fields which are initialized in this method, rule_ seems to have a different level of abstraction.
@@ -137,11 +137,13 @@ void IndexBase::put(const Key& key, const Field& field) { | |||
// return registry_.value().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is some commented-out code, which should be removed.
|
||
if (!key_.partialMatch(request)) { return false; } | ||
auto canonical = rule.parent().registry().canonicalise(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?
path = path.dirName(); | ||
path = path.realName(); | ||
if (p.exists()) { | ||
eckit::PathName path = p.isDir() ? p.realName() : p.dirName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still the same behaviour as before: negating the p.isDir()
part swtichted the order of the ternary operator.
@@ -38,6 +38,8 @@ class TypesRegistry : public eckit::Streamable { | |||
TypesRegistry() = default; | |||
|
|||
explicit TypesRegistry(eckit::Stream& stream); | |||
|
|||
void decode(eckit::Stream& stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd fix the ordering and keep the encoding and decoding bits together
* does it submit to any jurisdiction. | ||
*/ | ||
|
||
// #include "eckit/exception/Exceptions.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this commented-out include.
|
||
eckit::Translator<eckit::Date, std::string> t; | ||
|
||
for (std::vector<eckit::Date>::const_iterator i = dates.begin(); i != dates.end(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a for-each loop, use emplace_back on the values vector.
@@ -156,15 +156,5 @@ diff out checkv2.again | |||
|
|||
cmp 6.grib checkV2.again.grib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also going to ask: why was this removed?
enables usage of partial dates in the schema by using specific types: TypeYear/TypeMonth
partial dates are labelled with a metadata alias (year/month). This makes the fdb-list output usable as request
e.g.
with the following schema
fdb-list class=d1,expver=1,year=2002,time=12
returns:
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=11,resolution=standard,type=fc,levtype=pl}{date=20021130,time=1200,levelist=1000,param=129}
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=12,resolution=standard,type=fc,levtype=pl}{date=20021201,time=1200,levelist=1000,param=129}
we can then retrieve:
retrieve,class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,resolution=standard,type=fc,levtype=pl,year=2002month=11/12,date=20021130/20021201,time=1200,levelist=1000,param=129