Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fdb 332 partial date #64

Open
wants to merge 6 commits into
base: feature/rule-schema-key
Choose a base branch
from

Conversation

danovaro
Copy link
Member

@danovaro danovaro commented Jan 7, 2025

enables usage of partial dates in the schema by using specific types: TypeYear/TypeMonth

partial dates are labelled with a metadata alias (year/month). This makes the fdb-list output usable as request
e.g.
with the following schema

[ class=d1, dataset=climate-dt, activity, experiment, generation, model, realization, expver, stream=clte/wave, date: Year
       [ date: Month, resolution, type, levtype
               [ date: Date, time, levelist?, param, frequency?, direction? ]]
]

fdb-list class=d1,expver=1,year=2002,time=12
returns:
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=11,resolution=standard,type=fc,levtype=pl}{date=20021130,time=1200,levelist=1000,param=129}
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=12,resolution=standard,type=fc,levtype=pl}{date=20021201,time=1200,levelist=1000,param=129}

we can then retrieve:
retrieve,class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,resolution=standard,type=fc,levtype=pl,year=2002month=11/12,date=20021130/20021201,time=1200,levelist=1000,param=129

@danovaro danovaro changed the base branch from remoteFDB to feature/rule-schema-key-remote January 8, 2025 11:04
@danovaro danovaro changed the base branch from feature/rule-schema-key-remote to feature/rule-schema-key January 8, 2025 11:22
src/fdb5/api/fdb_c.cc Show resolved Hide resolved
src/fdb5/api/local/ListVisitor.h Show resolved Hide resolved
src/fdb5/api/local/ListVisitor.h Show resolved Hide resolved
src/fdb5/api/local/ListVisitor.h Show resolved Hide resolved
src/fdb5/database/Index.h Show resolved Hide resolved
queue_.emplace(currentCatalogue_->key(), currentIndex_->key(), datumKey, field.stableLocation(),
field.timestamp());
// Take into account any rule-specific behaviour in the request
auto canonical = rule_->registry().canonicalise(request_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to canonicalize the request once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the aim of this PR is associating different types to the same metadata in different level of the schema.
This gives us the possibility to use only a subset of the date (the year or the month) at top level, and the full info at an inner level.
For this reason, the user request has to be re-interpreted (and re-canonicalised) at each level, with the types associated with that specific level

Comment on lines +86 to +88
for (auto it = nodes_.begin(); it != nodes_.end(); it++) {
const auto& type = registry.lookupType(it->keyword_);
for (auto& value : it->values_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with the for each that was used before?

@@ -274,6 +277,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request
auto& node = graph.push(keyword);

for (const auto& value : values) {
// std::string value = type.toKey(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented out code.

@@ -304,6 +309,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request
auto& node = graph.push(keyword);

for (const auto& value : values) {
//std::string value = type.toKey(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented out code.

Comment on lines +323 to +329
LOG_DEBUG_LIB(LibFdb5) << "findMatchingKeys " << request << " ==> ";
std::string sep;
for (const auto& k: out) {
LOG_DEBUG_LIB(LibFdb5) << sep << k;
sep = " | ";
}
LOG_DEBUG_LIB(LibFdb5) << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this into a if(debug){} block? Otherwise we are having to pay the cost of the debug message even if it is never displayed

Copy link
Contributor

@mcakircali mcakircali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implements many good fixes.
I left non PR comments around canonicalization, which we may attempt to fix in a separate PR

Comment on lines +52 to 55
bool visitIndex(const Index&) override { NOTIMP; }

void visitDatum(const Field& /*field*/, const Key& /*datumKey*/) override { NOTIMP; }
void visitDatum(const Field&, const Key&) override { NOTIMP; }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be reverted, it slipped obviously

// If the DB is locked for listing, then it "doesn't exist"
if (!catalogue.enabled(ControlIdentifier::List)) {
return false;
}

bool ret = QueryVisitor::visitDatabase(catalogue);

ASSERT(currentCatalogue_->key().partialMatch(request_));
auto dbRequest = catalogue.rule().registry().canonicalise(request_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ozaq: no, not here since different cat. db (rule) would make different canonical request.
@danovaro: rule_->registry() ?

@@ -96,15 +108,17 @@ struct ListVisitor : public QueryVisitor<ListElement> {
bool visitIndex(const Index& index) override {
QueryVisitor::visitIndex(index);

if (index.partialMatch(request_)) {
if (index.partialMatch(*rule_, request_)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would indexRequest_ be better here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danovaro, there are big changes;
in hopes to avoid repeated canonicalise() per datum
I had canonicalized the datum request at visitIndex (rather than at visitDatum)
apperantly, there were issues, which meant we now canonicalize requests:
2x in index.partialMatch() per index and later per datum

I'd like to come back to this in another PR, and see if we can trim the fat for partial match and request topics

@@ -68,6 +69,7 @@ class DaosCatalogue : public CatalogueImpl, public DaosCommon {
private: // members

Schema schema_;
const RuleDatabase* rule_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule_ {nullptr};

std::set<Key> keys;
config_.schema().matchDatabase(request, keys, "");
config_.schema().matchDatabase(request, results, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate result of canonicalization
other option would be to leave this alone and do sth like db.schema->matchingRule(key) when needed
possibly unneeded

for (auto& [keyword, values] : nodes_) {
const auto& type = registry.lookupType(keyword);
for (auto& value : values) {
for (auto it = nodes_.begin(); it != nodes_.end(); it++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, my pretty structured binding range-based loop ? :)
I'll fix


if (!axes_.partialMatch(request)) { return false; }
canonical = rule.registry().canonicalise(request);
if (!axes_.partialMatch(canonical)) { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could simply do return axes_.partialMatch(canonical);


if (!key_.partialMatch(request)) { return false; }
auto canonical = rule.parent().registry().canonicalise(request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is parent registry not same ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?

@@ -156,15 +156,5 @@ diff out checkv2.again

cmp 6.grib checkV2.again.grib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, removed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also going to ask: why was this removed?

@@ -102,6 +106,8 @@ void DaosCatalogue::loadSchema() {
std::istringstream stream{std::string(v.begin(), v.end())};
schema_.load(stream);

rule_ = &schema_.matchingRule(dbKey_);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make rule_ also a reference instead of a pointer?

This would show the non-owning properties. matchingRule(dbKey) is returning a reference either way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is set during loading... matchingRule is returning a reference to an object handled by a unique_ptr. So I guess it's fine?

@@ -62,7 +62,7 @@ bool EntryVisitor::visitDatabase(const Catalogue& catalogue) {
currentCatalogue_ = &catalogue;
currentStore_ = nullptr;
currentIndex_ = nullptr;
rule_ = nullptr;
rule_ = &currentCatalogue_->rule();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rule_ element really needed? Wouldn't it be nice to replace all occurrences with the corresponding catalogue->rule() call? There are only 4 locations where this is called, if I'm not mistaken. Regarding the other fields which are initialized in this method, rule_ seems to have a different level of abstraction.

@@ -137,11 +137,13 @@ void IndexBase::put(const Key& key, const Field& field) {
// return registry_.value().get();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some commented-out code, which should be removed.


if (!key_.partialMatch(request)) { return false; }
auto canonical = rule.parent().registry().canonicalise(request);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?

path = path.dirName();
path = path.realName();
if (p.exists()) {
eckit::PathName path = p.isDir() ? p.realName() : p.dirName();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still the same behaviour as before: negating the p.isDir() part swtichted the order of the ternary operator.

@@ -38,6 +38,8 @@ class TypesRegistry : public eckit::Streamable {
TypesRegistry() = default;

explicit TypesRegistry(eckit::Stream& stream);

void decode(eckit::Stream& stream);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd fix the ordering and keep the encoding and decoding bits together

* does it submit to any jurisdiction.
*/

// #include "eckit/exception/Exceptions.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this commented-out include.


eckit::Translator<eckit::Date, std::string> t;

for (std::vector<eckit::Date>::const_iterator i = dates.begin(); i != dates.end(); ++i) {
Copy link

@tbkr tbkr Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a for-each loop, use emplace_back on the values vector.

@@ -156,15 +156,5 @@ diff out checkv2.again

cmp 6.grib checkV2.again.grib
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also going to ask: why was this removed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants