Feature/fdb 332 partial date #64

danovaro · 2025-01-07T15:04:12Z

enables usage of partial dates in the schema by using specific types: TypeYear/TypeMonth

partial dates are labelled with a metadata alias (year/month). This makes the fdb-list output usable as request
e.g.
with the following schema

[ class=d1, dataset=climate-dt, activity, experiment, generation, model, realization, expver, stream=clte/wave, date: Year
       [ date: Month, resolution, type, levtype
               [ date: Date, time, levelist?, param, frequency?, direction? ]]
]

fdb-list class=d1,expver=1,year=2002,time=12
returns:
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=11,resolution=standard,type=fc,levtype=pl}{date=20021130,time=1200,levelist=1000,param=129}
{class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,year=2002}{month=12,resolution=standard,type=fc,levtype=pl}{date=20021201,time=1200,levelist=1000,param=129}

we can then retrieve:
retrieve,class=d1,dataset=climate-dt,activity=story-nudging,experiment=tplus2.0k,generation=1,model=ifs-fesom,realization=1,expver=0001,stream=clte,resolution=standard,type=fc,levtype=pl,year=2002month=11/12,date=20021130/20021201,time=1200,levelist=1000,param=129

…rtialDate added year and month aliases and extended tests

…rtialDate

src/fdb5/api/fdb_c.cc

src/fdb5/api/local/ListVisitor.h

src/fdb5/database/Index.h

Ozaq · 2025-01-08T15:40:24Z

src/fdb5/api/local/ListVisitor.h

-            queue_.emplace(currentCatalogue_->key(), currentIndex_->key(), datumKey, field.stableLocation(),
-                           field.timestamp());
+        // Take into account any rule-specific behaviour in the request
+        auto canonical = rule_->registry().canonicalise(request_);


Wouldn't it make sense to canonicalize the request once?

the aim of this PR is associating different types to the same metadata in different level of the schema.
This gives us the possibility to use only a subset of the date (the year or the month) at top level, and the full info at an inner level.
For this reason, the user request has to be re-interpreted (and re-canonicalised) at each level, with the types associated with that specific level

Ozaq · 2025-01-08T15:47:08Z

src/fdb5/rules/Rule.cc

+        for (auto it = nodes_.begin(); it != nodes_.end(); it++) {
+            const auto& type = registry.lookupType(it->keyword_);
+            for (auto& value : it->values_) {


What's wrong with the for each that was used before?

Ozaq · 2025-01-08T15:47:57Z

src/fdb5/rules/Rule.cc

@@ -274,6 +277,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request
        auto& node = graph.push(keyword);

        for (const auto& value : values) {
+            // std::string value = type.toKey(val);


Please remove commented out code.

Ozaq · 2025-01-08T15:48:16Z

src/fdb5/rules/Rule.cc

@@ -304,6 +309,7 @@ std::vector<Key> Rule::findMatchingKeys(const metkit::mars::MarsRequest& request
        auto& node = graph.push(keyword);

        for (const auto& value : values) {
+            //std::string value = type.toKey(val);


Please remove commented out code.

Ozaq · 2025-01-08T15:50:05Z

src/fdb5/rules/Rule.cc

+    LOG_DEBUG_LIB(LibFdb5) << "findMatchingKeys  " << request << " ==> ";
+    std::string sep;
+    for (const auto& k: out) {
+        LOG_DEBUG_LIB(LibFdb5) << sep << k;
+        sep = " | ";
+    }
+    LOG_DEBUG_LIB(LibFdb5) << std::endl;


Can we put this into a if(debug){} block? Otherwise we are having to pay the cost of the debug message even if it is never displayed

mcakircali

implements many good fixes.
I left non PR comments around canonicalization, which we may attempt to fix in a separate PR

mcakircali · 2025-01-08T21:28:42Z

src/fdb5/api/local/DumpVisitor.h

+    bool visitIndex(const Index&) override { NOTIMP; }

-    void visitDatum(const Field& /*field*/, const Key& /*datumKey*/) override { NOTIMP; }
+    void visitDatum(const Field&, const Key&) override { NOTIMP; }



can be reverted, it slipped obviously

mcakircali · 2025-01-08T22:10:19Z

src/fdb5/api/local/ListVisitor.h

        // If the DB is locked for listing, then it "doesn't exist"
        if (!catalogue.enabled(ControlIdentifier::List)) {
            return false;
        }

        bool ret = QueryVisitor::visitDatabase(catalogue);

-        ASSERT(currentCatalogue_->key().partialMatch(request_));
+        auto dbRequest = catalogue.rule().registry().canonicalise(request_);


@Ozaq: no, not here since different cat. db (rule) would make different canonical request.
@danovaro: rule_->registry() ?

mcakircali · 2025-01-08T23:52:30Z

src/fdb5/api/local/ListVisitor.h

@@ -96,15 +108,17 @@ struct ListVisitor : public QueryVisitor<ListElement> {
    bool visitIndex(const Index& index) override {
        QueryVisitor::visitIndex(index);

-        if (index.partialMatch(request_)) {
+        if (index.partialMatch(*rule_, request_)) {



would indexRequest_ be better here ?

mcakircali · 2025-01-09T08:58:40Z

src/fdb5/api/local/ListVisitor.h

@danovaro, there are big changes;
in hopes to avoid repeated canonicalise() per datum
I had canonicalized the datum request at visitIndex (rather than at visitDatum)
apperantly, there were issues, which meant we now canonicalize requests:
2x in index.partialMatch() per index and later per datum

I'd like to come back to this in another PR, and see if we can trim the fat for partial match and request topics

mcakircali · 2025-01-09T09:03:14Z

src/fdb5/daos/DaosCatalogue.h

@@ -68,6 +69,7 @@ class DaosCatalogue : public CatalogueImpl, public DaosCommon {
 private: // members

    Schema schema_;
+    const RuleDatabase* rule_;


rule_ {nullptr};

mcakircali · 2025-01-09T12:37:49Z

src/fdb5/toc/RootManager.cc

    std::set<Key> keys;
-    config_.schema().matchDatabase(request, keys, "");
+    config_.schema().matchDatabase(request, results, "");


unfortunate result of canonicalization
other option would be to leave this alone and do sth like db.schema->matchingRule(key) when needed
possibly unneeded

mcakircali · 2025-01-09T12:53:45Z

src/fdb5/rules/Rule.cc

-        for (auto& [keyword, values] : nodes_) {
-            const auto& type = registry.lookupType(keyword);
-            for (auto& value : values) {
+        for (auto it = nodes_.begin(); it != nodes_.end(); it++) {


ha, my pretty structured binding range-based loop ? :)
I'll fix

mcakircali · 2025-01-09T12:59:17Z

src/fdb5/database/Index.cc


-    if (!axes_.partialMatch(request)) { return false; }
+    canonical = rule.registry().canonicalise(request);
+    if (!axes_.partialMatch(canonical)) { return false; }


could simply do return axes_.partialMatch(canonical);

mcakircali · 2025-01-09T13:01:12Z

src/fdb5/database/Index.cc


-    if (!key_.partialMatch(request)) { return false; }
+    auto canonical = rule.parent().registry().canonicalise(request);


is parent registry not same ?

I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?

mcakircali · 2025-01-09T13:10:44Z

tests/regressions/FDB-238/FDB-238.sh.in

@@ -156,15 +156,5 @@ diff out checkv2.again

 cmp 6.grib checkV2.again.grib


fyi, removed

I was also going to ask: why was this removed?

tbkr · 2025-01-09T15:54:04Z

src/fdb5/daos/DaosCatalogue.cc

@@ -102,6 +106,8 @@ void DaosCatalogue::loadSchema() {
    std::istringstream stream{std::string(v.begin(), v.end())};
    schema_.load(stream);

+    rule_ = &schema_.matchingRule(dbKey_);


Should we make rule_ also a reference instead of a pointer?

This would show the non-owning properties. matchingRule(dbKey) is returning a reference either way.

Ah, this is set during loading... matchingRule is returning a reference to an object handled by a unique_ptr. So I guess it's fine?

tbkr · 2025-01-09T16:12:29Z

src/fdb5/database/EntryVisitMechanism.cc

@@ -62,7 +62,7 @@ bool EntryVisitor::visitDatabase(const Catalogue& catalogue) {
    currentCatalogue_ = &catalogue;
    currentStore_ = nullptr;
    currentIndex_ = nullptr;
-    rule_ = nullptr;
+    rule_ = &currentCatalogue_->rule();


Is this rule_ element really needed? Wouldn't it be nice to replace all occurrences with the corresponding catalogue->rule() call? There are only 4 locations where this is called, if I'm not mistaken. Regarding the other fields which are initialized in this method, rule_ seems to have a different level of abstraction.

tbkr · 2025-01-09T16:16:07Z

src/fdb5/database/Index.cc

@@ -137,11 +137,13 @@ void IndexBase::put(const Key& key, const Field& field) {
 //     return registry_.value().get();


Here is some commented-out code, which should be removed.

tbkr · 2025-01-09T16:18:08Z

src/fdb5/database/Index.cc


-    if (!key_.partialMatch(request)) { return false; }
+    auto canonical = rule.parent().registry().canonicalise(request);


I struggle to understand what is going on here. Could we add a comment explaining why the parent registry is used first?

tbkr · 2025-01-10T08:24:44Z

src/fdb5/toc/TocEngine.cc

-                    path = path.dirName();
-                path = path.realName();
+            if (p.exists()) {
+                eckit::PathName path = p.isDir() ? p.realName() : p.dirName();


I think this is still the same behaviour as before: negating the p.isDir() part swtichted the order of the ternary operator.

tbkr · 2025-01-10T08:27:53Z

src/fdb5/types/TypesRegistry.h

@@ -38,6 +38,8 @@ class TypesRegistry : public eckit::Streamable {
    TypesRegistry() = default;

    explicit TypesRegistry(eckit::Stream& stream);
+
+    void decode(eckit::Stream& stream);


I'd fix the ordering and keep the encoding and decoding bits together

tbkr · 2025-01-10T08:28:25Z

src/fdb5/types/TypeYear.cc

+ * does it submit to any jurisdiction.
+ */
+
+// #include "eckit/exception/Exceptions.h"


Remove this commented-out include.

tbkr · 2025-01-10T08:34:14Z

src/fdb5/types/TypeYear.cc

+
+    eckit::Translator<eckit::Date, std::string> t;
+
+    for (std::vector<eckit::Date>::const_iterator i = dates.begin(); i != dates.end(); ++i) {


Make this a for-each loop, use emplace_back on the values vector.

tbkr · 2025-01-10T08:35:16Z

tests/regressions/FDB-238/FDB-238.sh.in

@@ -156,15 +156,5 @@ diff out checkv2.again

 cmp 6.grib checkV2.again.grib


I was also going to ask: why was this removed?

danovaro added 5 commits December 13, 2024 17:08

FDB-332 test case + request canonicalisation at DB level

ccd099d

FDB-332 fdb-list request canonicalisation at index and datum level

cd00a02

reverted rule/schema stream changes

1f7dd83

Merge branch 'feature/rule-schema-key-remote' into feature/FDB-332_pa…

ed23ee7

…rtialDate added year and month aliases and extended tests

additional tests

5fe7e98

danovaro requested review from ChrisspyB, mcakircali, Ozaq and simondsmart January 7, 2025 15:04

Merge branch 'feature/rule-schema-key-remote' into feature/FDB-332_pa…

4c3d4ec

…rtialDate

danovaro changed the base branch from remoteFDB to feature/rule-schema-key-remote January 8, 2025 11:04

danovaro changed the base branch from feature/rule-schema-key-remote to feature/rule-schema-key January 8, 2025 11:22

Ozaq reviewed Jan 8, 2025

View reviewed changes

mcakircali approved these changes Jan 9, 2025

View reviewed changes

tbkr requested changes Jan 9, 2025

View reviewed changes

tbkr reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/fdb 332 partial date #64

Feature/fdb 332 partial date #64

danovaro commented Jan 7, 2025

Ozaq Jan 8, 2025

danovaro Jan 16, 2025

Ozaq Jan 8, 2025

Ozaq Jan 8, 2025

Ozaq Jan 8, 2025

Ozaq Jan 8, 2025

mcakircali left a comment

mcakircali Jan 8, 2025

mcakircali Jan 8, 2025

mcakircali Jan 8, 2025

mcakircali Jan 9, 2025

mcakircali Jan 9, 2025

mcakircali Jan 9, 2025

mcakircali Jan 9, 2025

mcakircali Jan 9, 2025

mcakircali Jan 9, 2025

tbkr Jan 9, 2025

mcakircali Jan 9, 2025

tbkr Jan 10, 2025

tbkr Jan 9, 2025

tbkr Jan 9, 2025

tbkr Jan 9, 2025

tbkr Jan 9, 2025

tbkr Jan 9, 2025

tbkr Jan 10, 2025

tbkr Jan 10, 2025

tbkr Jan 10, 2025

tbkr Jan 10, 2025 •

edited

Loading

tbkr Jan 10, 2025


		if (!key_.partialMatch(request)) { return false; }
		auto canonical = rule.parent().registry().canonicalise(request);

		@@ -156,15 +156,5 @@ diff out checkv2.again

		cmp 6.grib checkV2.again.grib

		@@ -137,11 +137,13 @@ void IndexBase::put(const Key& key, const Field& field) {
		// return registry_.value().get();


		eckit::Translator<eckit::Date, std::string> t;

		for (std::vector<eckit::Date>::const_iterator i = dates.begin(); i != dates.end(); ++i) {

Feature/fdb 332 partial date #64

Are you sure you want to change the base?

Feature/fdb 332 partial date #64

Conversation

danovaro commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcakircali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbkr Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbkr Jan 10, 2025 •

edited

Loading