Added glossary and design decisions. #3

pfisterp · 2018-08-01T18:32:46Z

No description provided.

lindholc

Some feedback:

There are some formatting issues that need to be addressed.
I think the glossary and design decisions should be separate documents.
I think some of the entries in the glossary are better off in Scaladocs for the respective classes, and what we should have instead are higher-level descriptions of the concepts being represented.

I haven't gone over the wording in great detail.

We should also wait until #2 is merged and rebase on master or do something else to prevent a merge commit between these branches.

lindholc · 2018-08-02T00:55:14Z

README.md

@@ -1,5 +1,52 @@
+#Summary


As you can see if you look at this branch in the code view, there are some formatting issues with the headings.

You need a space between #s and the text.

lindholc · 2018-08-02T00:56:03Z

README.md

 The usual collection of *Adapters* and *Writers* are conceptually similar but take better advantage of the new data model implementation. The *Adapter* life cycle, in particular, is much cleaner. This new version of LaTiS will continue to support the same service interface (and more).
+
+#Glossary
+Items that are capitalized are Scala class names.


Typically "code things" are surrounded in backticks like this:

`DataType`

So they look like this: DataType

Those ticks are mapped to the html "code" tag so that seems like the right thing to do when referring to a specific class. Maybe another case where we need to differentiate concepts from actual implementation?

lindholc · 2018-08-02T00:58:25Z

README.md

+* **Tuple** - A *Tuple* is a *DataType* consisting of a list of *DataType* objects.  Unlike a function, the *Datatype* objects in a *Tuple* are associated but are not dependant on each other.  
+An example is a 2D point consisting of the x,y *Tuple*.  The scalars x and y are associated but x is not a function of y nor is y a function of x. Elements of a *Tuple* also do not have to be of the same type.  Tuple is a member of the *DataType* algebraic data type.
+
+#Design Decisions


I feel like this should live as a separate document.

lindholc · 2018-08-02T15:14:26Z

README.md

 The usual collection of *Adapters* and *Writers* are conceptually similar but take better advantage of the new data model implementation. The *Adapter* life cycle, in particular, is much cleaner. This new version of LaTiS will continue to support the same service interface (and more).
+
+#Glossary


I think this should also be a separate document.

I like the idea of the README as relatively short document giving a brief overview, some "getting started" tips (like how to pull in the library to SBT or how to download and run a barebones thing), and pointers to additional documentation, like a glossary or design decisions.

I also wonder how much of this duplicates what would be in the docs for certain classes. For instance, the entry for Metadata here seems like it could be a better fit for the Scaladocs and what could go here is a higher-level description of the concept of metadata without our implementation details.

"Glossary" does lead me to expect definitions of terms. What do we mean by "metadata" as opposed to how do we implement it. I agree with Chris that the latter belongs in the scaladoc. The question becomes whether we want to duplicate information and risk it getting out of date. I'm inclined to document the high level concepts that are unlikely to change in the glossary. Maybe add a link to the code or scaladoc?

lindholc · 2018-08-02T16:03:42Z

README.md

+* **fdml** - (Functional Datamodel Markup Language) An xml file format to describe the contents of a *Dataset*.  
+See the FDML Schema section below.
+* **Function** - A *Function* is a *DataType* consisting of exactly two *DataType* objects, a domain and a range. Function is a member of the DataType algebraic data type.
+* **functional** **data** **model** - Datsets consist of ordered sequences of samples that are decoupled from metadata.  Operations that implement the functional algebra can be performed on these samples to create new datasets.


I believe **functional data model** will work just fine here.

There's probably also more that can be said about the FDM.

sapols · 2018-08-02T17:06:21Z

I'm not sure that I agree with everything Chris said. I think I do like the idea of making a separate document for the glossary; I can see how it could clutter up the README. I like the design decisions in the README. But I don't think we need to move anything from the glossary into Scaladocs. Sure, adding stuff to Scaladocs is a good idea, but I think we can do that and still keep the entries in the glossary. I like the idea of having a "one-stop-shop" glossary for our terms better than having to go combing through documentation to find something.

sapols · 2018-08-02T17:10:46Z

README.md

@@ -1,5 +1,52 @@
+# Summary
+
 *LaTiS version 3* is a rewrite of the *Functional Data Model* and *Algebra* based on lessons learned from version 2, and with a bit more inspiration from Functional Programming. It will incubate as *latis3 version 0.x* until we are ready to finalize the API and release it as *latis version 3.x*.


A little nitpicky, but we should be consistent with how we capitalize "LaTiS". For example, we have "Latis3 version 0.x" in this paragraph and I think that should be "LaTiS3 version 0.x".

Please don't apologize, these are actual mistakes!

sapols · 2018-08-02T17:14:03Z

README.md

+Items that are capitalized are Scala class names.
+
+* **arity** - The number of un-flattened *DataTypes* in the domain of a function.  
+For example, a function with a domain of (x, y) and wavelength has an arity of 2.


Maybe "...with a domain of (x, y) and a range of wavelength..."?

No, actually wavelength is part of the domain here.

Ahh! I misread. Twice.

Should we say "domain of ((x, y), wavelength)"?

sapols · 2018-08-02T17:16:17Z

README.md

+For example, a function with a domain of (x, y) and wavelength has an arity of 2.
+* **Data** - A recursive algebraic data type consisting of *ScalarData*, *TupleData*, and *FunctionData*. Unlike the *DataType*, which describes the model, the *Data* objects contain actual values representing measurements.  Since *TupleData* and *FunctionData* contain multiple *Data* objects they can be represented as trees just like DataType objects.
+* **Dataset** - A *Dataset* instance consists of three things:
+  1. *MetaData* for the dataset itself. A list of key/value pairs.


Maybe use a colon? "MetaData for the dataset itself: A list of key/value pairs."

BTW, we don't capitalize the "D" in "Metadata".

sapols · 2018-08-02T17:18:34Z

README.md

+* **DataType** - A recursive algebraic data type consisting of *Scalars*, *Tuples*, and *Functions*.  The *DataType* is purely in the realm of the model.  Since *Tuples* and Functions themselves contain multiple *DataType* objects they can be nested with arbitrary complexity, meaning a *DataType* can be represented as a tree.  
+Also, every *DataType* must contain a *MetaData* object.
+* **dimensionality** - The number of flattened *DataTypes* in the domain of a function.  
+For example, a function with a domain of (x, y) and wavelength has a dimensionality of 3.


Maybe "...with a domain of (x, y) and a range of wavelength..."?

Again, wavelength is part of the domain. I will add a range to make things less confusing for both arity and dimensionality.

sapols · 2018-08-02T17:21:53Z

README.md

+* **range** - The dependant variables of a function.
+* **Sample** - A *Sample* is no more than the dimensionality of a dataset followed by an array of *Data* objects.  Nested tuples are not allowed in a *Sample* but *ScalarData* and *FunctionData* are.
+* **Scalar** - A *Scalar* is a *DataType* consisting of a single atomic variable.  A *DataType* of arbitrary complexity must ultimately consist of *Scalar* objects as the leaf nodes of the representation tree.  *Scalar* is a member of the *DataType* algebraic data type.
+* **Tuple** - A *Tuple* is a *DataType* consisting of a list of *DataType* objects.  Unlike a function, the *Datatype* objects in a *Tuple* are associated but are not dependant on each other.  


Should be "dependent" instead of "dependant".

sapols · 2018-08-02T17:22:26Z

README.md

+* **Function** - A *Function* is a *DataType* consisting of exactly two *DataType* objects, a domain and a range. Function is a member of the DataType algebraic data type.
+* **functional data model** - Datsets consist of ordered sequences of samples that are decoupled from metadata.  Operations that implement the functional algebra can be performed on these samples to create new datasets.
+* **Metadata** - An instance of *Metadata* is simply a list of key/value pairs to describe a *DataType* or *Dataset*.  Every *Scalar* object is required to contain a *MetaData* object, but *MetaData* objects may be empty for *Tuple* and *Function* objects.
+* **range** - The dependant variables of a function.


Should be "dependent" instead of "dependant".

Yeah, we don't celebrate Independance Day do we?

Maybe we should though... 💃

sapols · 2018-08-02T17:23:24Z

README.md

+
+# Design Decisions
+* *DataType* and *Data* objects are completely decoupled:  
+  * Con: Trying to maintain two identical tree structures is difficult.


I really like this pro/con approach you took here.

sapols · 2018-08-02T17:24:37Z

README.md

+  * Con: Seems arbitrary to require both a single *Data* item followed by a list of additional *Data* items.
+  * Pro: Ensures that at least one Data item is in the *TupleData*.
+* *Sample* objects are not implemented as a tuple of domain and range, as a *Function* is, but are instead implemented as an integer representing dimensionality followed by the domain and range combined into a single list:  
+  * Con: The dimensionality number feels  arbitrary.  simply breaking a *Sample* into domain and range would be more elegant.


Accidental double space between "feels" and "arbitrary". Need to capitalize "simply".

sapols

I added a few comments. Once they're addressed, this all looks awesome to me.

sapols · 2018-08-02T17:34:44Z

README.md

@@ -42,7 +42,7 @@ An example is a 2D point consisting of the x,y *Tuple*.  The scalars x and y are
  * Con: Seems arbitrary to require both a single *Data* item followed by a list of additional *Data* items.
  * Pro: Ensures that at least one Data item is in the *TupleData*.
 * *Sample* objects are not implemented as a tuple of domain and range, as a *Function* is, but are instead implemented as an integer representing dimensionality followed by the domain and range combined into a single list:  
-  * Con: The dimensionality number feels  arbitrary.  simply breaking a *Sample* into domain and range would be more elegant.
+  * Con: The dimensionality number feels arbitrary.  simply breaking a *Sample* into domain and range would be more elegant.


There's still a double space before the word "simply", which still isn't capitalized. :)

sapols · 2018-08-02T17:41:59Z

I went ahead and approved this PR. I'll let the discussion about the glossary get resolved under Chris's "requested changes". I gave my two cents above.

lindholc · 2018-08-03T18:07:29Z

docs/design_decisions.md

@@ -0,0 +1,16 @@
+# Design Decisions


Looking at the Geotrellis Architecture Decision Records, they appear to be focusing on larger architectural things, like whether they're using the stlib Future or scalaz Task.

We would need to have some standard by which we judge something to be worthy of including in this document.

I could imagine describing DataType vs. Data and the nature of Sample, but I think the bit about TupleData falls below the bar.

lindholc · 2018-08-03T18:32:18Z

docs/design_decisions.md

+  * Pro: Computation efficiencies require data to be unencumbered with metadata and model concepts. 
+* *TupleData* is implemented as a tuple consisting of the first *Data* item and all of the rest:  
+  * Con: Seems arbitrary to require both a single *Data* item followed by a list of additional *Data* items.
+  * Pro: Ensures that at least one Data item is in the *TupleData*.


I don't think that this is true. There's an apply method on the companion object that accepts Seq[Data], which could be empty:

scala> TupleData(List()) scala.NotImplementedError: an implementation is missing at scala.Predef$.$qmark$qmark$qmark(Predef.scala:284) at latis.data.TupleData$.apply(Data.scala:44) ... 36 elided

lindholc · 2018-08-03T18:34:50Z

docs/design_decisions.md

+  * Con: Trying to maintain two identical tree structures is difficult.
+  * Pro: Computation efficiencies require data to be unencumbered with metadata and model concepts. 
+* *TupleData* is implemented as a tuple consisting of the first *Data* item and all of the rest:  
+  * Con: Seems arbitrary to require both a single *Data* item followed by a list of additional *Data* items.


I wouldn't say it's arbitrary. As you note, it (sort of) ensures that there's at least one item, and that's because this (a single element paired with a list of elements) is how you encode non-empty lists.

lindholc · 2018-08-03T18:43:17Z

docs/glossary.md

@@ -0,0 +1,25 @@
+# Glossary


I'll reiterate that I think the glossary ought to describe concepts (what "metadata" is, what "dataset" means to us) rather than implementation (Metadata is Map[String, String]).

The latter is hopefully apparent from the code and could be described in the Scaladocs. Having implementation details here just adds another place where things are likely to get out of date if our discipline wavers.

Definitions of concepts are less likely to change with time and probably useful enough to be placed at such a high level.

lindholc · 2018-08-03T18:59:13Z

README.md

-  * Pro: The model contains all the information needed to represent nested tuples.
-
-
+see glossary [here](docs/glossary.md)


For accessibility reasons it's probably better for the links to be on "glossary" and "design decisions."

Good catch! That totally slipped passed me, but yes, "here" or "see more" links are notorious accessibility issues.

dlindhol

This information is valuable but it got me thinking about what belongs in the code docs vs here at the top level. I want to remain DRY so we don't have to worry about things getting out of sync. I'd like to see the broad architectural issues here and implementation details in the code. That boundary isn't always very clear, though.

It also struck me that some of this feels like a quick-start guide. I'm wondering if we need another document for that. There are ways to make docs with code that compile so we are more likely to keep them up to date.

I'm fine with this as a good start. We can be agile and evolve the docs as v3 incubates.

dlindhol · 2018-08-03T18:06:25Z

README.md

 The usual collection of *Adapters* and *Writers* are conceptually similar but take better advantage of the new data model implementation. The *Adapter* life cycle, in particular, is much cleaner. This new version of LaTiS will continue to support the same service interface (and more).
+
+#Glossary


"Glossary" does lead me to expect definitions of terms. What do we mean by "metadata" as opposed to how do we implement it. I agree with Chris that the latter belongs in the scaladoc. The question becomes whether we want to duplicate information and risk it getting out of date. I'm inclined to document the high level concepts that are unlikely to change in the glossary. Maybe add a link to the code or scaladoc?

dlindhol · 2018-08-03T18:17:16Z

README.md

 The usual collection of *Adapters* and *Writers* are conceptually similar but take better advantage of the new data model implementation. The *Adapter* life cycle, in particular, is much cleaner. This new version of LaTiS will continue to support the same service interface (and more).
+
+#Glossary
+Items that are capitalized are Scala class names.


Those ticks are mapped to the html "code" tag so that seems like the right thing to do when referring to a specific class. Maybe another case where we need to differentiate concepts from actual implementation?

dlindhol · 2018-08-03T18:19:11Z

README.md

+Items that are capitalized are Scala class names.
+
+* **arity** - The number of un-flattened *DataTypes* in the domain of a function.  
+For example, a function with a domain of (x, y) and wavelength has an arity of 2.


Should we say "domain of ((x, y), wavelength)"?

dlindhol · 2018-08-03T18:25:21Z

README.md

+For example, a function with a domain of (x, y) and wavelength has an arity of 2.
+* **Data** - A recursive algebraic data type consisting of *ScalarData*, *TupleData*, and *FunctionData*. Unlike the *DataType*, which describes the model, the *Data* objects contain actual values representing measurements.  Since *TupleData* and *FunctionData* contain multiple *Data* objects they can be represented as trees just like DataType objects.
+* **Dataset** - A *Dataset* instance consists of three things:
+  1. *MetaData* for the dataset itself. A list of key/value pairs.


BTW, we don't capitalize the "D" in "Metadata".

dlindhol · 2018-08-03T18:36:32Z

docs/design_decisions.md

+* *DataType* and *Data* objects are completely decoupled:  
+  * Con: Trying to maintain two identical tree structures is difficult.
+  * Pro: Computation efficiencies require data to be unencumbered with metadata and model concepts. 
+* *TupleData* is implemented as a tuple consisting of the first *Data* item and all of the rest:  


I'm wondering if an implementation detail like this should be in the code and/or scaladoc. I was imagining these design decisions to be cross-cutting architectural ones.

A decision about not having empty tuples might qualify for this high level.

dlindhol · 2018-08-03T18:38:26Z

docs/design_decisions.md

+  * Pro: Computation efficiencies require data to be unencumbered with metadata and model concepts. 
+* *TupleData* is implemented as a tuple consisting of the first *Data* item and all of the rest:  
+  * Con: Seems arbitrary to require both a single *Data* item followed by a list of additional *Data* items.
+  * Pro: Ensures that at least one Data item is in the *TupleData*.


It also disambiguates construction from a Seq[Data], which was the original reason for it.

dlindhol · 2018-08-03T18:46:11Z

docs/glossary.md

+* **dimensionality** - The number of flattened *DataTypes* in the domain of a function.  
+For example, a function with a domain of (x, y) and wavelength with a range of irradiance has a dimensionality of 3.
+* **domain** - The independent variables of a function.
+* **fdml** - (Functional Datamodel Markup Language) An xml file format to describe the contents of a *Dataset*.  


Hmm. What about "Function Data Model Language"? Is "markup" relevant in this context, other that the rich history of *mls?

dlindhol · 2018-08-03T18:48:45Z

docs/glossary.md

+* **fdml** - (Functional Datamodel Markup Language) An xml file format to describe the contents of a *Dataset*.  
+See the FDML Schema section below.
+* **Function** - A *Function* is a *DataType* consisting of exactly two *DataType* objects, a domain and a range. Function is a member of the DataType algebraic data type.
+* **functional data model** - Datsets consist of ordered sequences of samples that are decoupled from metadata.  Operations that implement the functional algebra can be performed on these samples to create new datasets.


It feels like this needs a higher level definition.

Added glossary and design decisions.

f8e66c9

sapols requested review from sapols and dlindhol August 1, 2018 20:05

lindholc self-requested a review August 2, 2018 00:52

lindholc requested changes Aug 2, 2018

View reviewed changes

Formatting issues corrected.

ec8b44e

sapols reviewed Aug 2, 2018

View reviewed changes

sapols requested changes Aug 2, 2018

View reviewed changes

Changes from reviewer comments.

65069c5

sapols reviewed Aug 2, 2018

View reviewed changes

Oops

df6bfa9

sapols approved these changes Aug 2, 2018

View reviewed changes

Restructured the presentation of the glossary and the design decisions.

5fd55c9

lindholc reviewed Aug 3, 2018

View reviewed changes

dlindhol reviewed Aug 3, 2018

View reviewed changes

Reviewer comments are implemented.

e6c27f9

		The usual collection of Adapters and Writers are conceptually similar but take better advantage of the new data model implementation. The Adapter life cycle, in particular, is much cleaner. This new version of LaTiS will continue to support the same service interface (and more).

		#Glossary

		@@ -1,5 +1,52 @@
		# Summary

		LaTiS version 3 is a rewrite of the Functional Data Model and Algebra based on lessons learned from version 2, and with a bit more inspiration from Functional Programming. It will incubate as latis3 version 0.x until we are ready to finalize the API and release it as latis version 3.x.

		* Pro: The model contains all the information needed to represent nested tuples.


		see glossary [here](docs/glossary.md)

Added glossary and design decisions. #3

Are you sure you want to change the base?

Added glossary and design decisions. #3

Conversation

pfisterp commented Aug 1, 2018

lindholc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sapols commented Aug 2, 2018

sapols Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sapols Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sapols Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

sapols left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sapols commented Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlindhol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sapols Aug 2, 2018 •

edited

Loading

sapols Aug 2, 2018 •

edited

Loading

sapols Aug 2, 2018 •

edited

Loading

sapols commented Aug 2, 2018 •

edited

Loading