Type guessing in MapDecoder? #16

gtauriello · 2018-07-18T10:06:01Z

I was thinking of ways to help generic parsing of custom properties (e.g. imagine being able to visualize whatever anyone adds as custom atom or residue properties in an MMTF file in PyMOL ;-)). This is complementary #13 and related to possible extensions of the MMTF spec, where we know what per-atom/per-residue/... properties are (see rcsb/mmtf#32).

My idea is to extend the MapDecoder class with a guessType function which returns an enum for all possible types. Currently that would be something like:

enum DecodableType {
  NOT_SUPPORTED,
  NO_KEY,
  FLOAT,
  INT32,
  CHAR,
  STRING,
  FLOAT_VECTOR,
  INT8_VECTOR,
  INT16_VECTOR,
  INT32_VECTOR,
  STRING_VECTOR,
  CHAR_VECTOR
}

The first two are if key is not found or type is not supported. A user could then use that output and decide to call decode with an appropriately typed object. I think it should be possible to implement something like this with rather minimal changes.

What do you guys think?

The text was updated successfully, but these errors were encountered:

gtauriello · 2018-07-18T11:00:19Z

One more thing that came to my mind. This change would also require some extra convenience functionality in MapDecoder to iterate over the keys in the map (or just a getter function for the private string-key/msgpack-object map).

speleo3 · 2018-07-18T15:29:14Z

@gtauriello - do you have some example code how you would use this?

gtauriello · 2018-07-19T13:32:01Z

Ok I cooked up something quickly to read and normalize any vector of values for per-residue/per-atom/... quantities (e.g. to use as color maps).

// input obj is assumed to be msgpack-map (e.g. groupProperties field as in rcsb/mmtf#32)
// output is key/vector pairs for coloring
std::map<std::string, std::vector<float>>
getNormalizedVectorMap(const msgpack::object& obj) {
    // each vector has values in [0,1] for coloring
    std::map<std::string, std::vector<float>> color_maps;
    // parse all keys
    mmtf::MapDecoder md(obj);
    for (auto& key_data : md.getDataMap()) {
        std::string& key = key_data.first;
        msgpack::object* data = key_data.second;
        switch (md.guessType(key)) {
        case mmtf::MapDecoder::FLOAT_VECTOR: {
            color_maps[key] = getNormalizedValues<float>(md, data);
            break;
        }
        case mmtf::MapDecoder::INT8_VECTOR: {
            color_maps[key] = getNormalizedValues<int8_t>(md, data);
            break;
        }
        case mmtf::MapDecoder::INT16_VECTOR: {
            color_maps[key] = getNormalizedValues<int16_t>(md, data);
            break;
        }
        case mmtf::MapDecoder::INT32_VECTOR: {
            color_maps[key] = getNormalizedValues<int32_t>(md, data);
            break;
        }
        default: {
            // silently skip rest or write message...
            break;
        }
        }
    }
    return color_maps;
}

// Templatized function mapping numeric values to [0,1]
template<typename T>
std::vector<float>
getNormalizedValues(mmtf::MapDecoder& md, msgpack::object* data) {
    std::vector<float> normalized_values;
    std::vector<T> values;
    md.decode(data, values);
    // somehow normalize values into normalized_values
    return normalized_values;
}

// As described above
DecodableType MapDecoder::guessType(const std::string& key);
// Access to internal data_map_
const std::map<std::string, msgpack::object*>& MapDecoder::getDataMap();
// Convenience function to decode msgpack::object* directly
template<typename T>
void MapDecoder::decode(msgpack::object* obj, T& target);

danpf · 2018-07-20T16:27:00Z

The only problem in your example is that it's difficult make a 1:1 map of atomProperties in c++ because it could be a list of strings or floats. I like the idea, i'm just not exactly sure how to go about it. Maybe return map to the enum of the mmtf::MapDecoder::TYPE, and just let the user handle that?

gtauriello · 2018-07-20T21:52:45Z

@danpf The idea is that in something like C++ you will have to convert the input data into a shared format anyways (actually that's true for any programming language if you want a common functionality from a generic input). Of course my example above could easily be expanded to also color based on input with a DecodableType of STRING_VECTOR if you wanted to add a legend or so. The guessType approach can help with anything that we decode to anyways (incl. binary encodings). For the rest (i.e. NOT_SUPPORTED above), the user has access to the msgpack::object so you can always go for the msgpack type directly and do something custom...

speleo3 · 2018-07-22T10:09:24Z

I suggest to wait with adding such functionality until we have a use case. Without a use case (some application which wants to use this API) we might engineer into the wrong direction.

gtauriello added the wait for usecase Waiting for a real usecase scenario for the proposed enhancement label Jul 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type guessing in MapDecoder? #16

Type guessing in MapDecoder? #16

gtauriello commented Jul 18, 2018

gtauriello commented Jul 18, 2018

speleo3 commented Jul 18, 2018

gtauriello commented Jul 19, 2018

danpf commented Jul 20, 2018

gtauriello commented Jul 20, 2018

speleo3 commented Jul 22, 2018

Type guessing in MapDecoder? #16

Type guessing in MapDecoder? #16

Comments

gtauriello commented Jul 18, 2018

gtauriello commented Jul 18, 2018

speleo3 commented Jul 18, 2018

gtauriello commented Jul 19, 2018

danpf commented Jul 20, 2018

gtauriello commented Jul 20, 2018

speleo3 commented Jul 22, 2018