From caa5cd3e017cad181189a172796e9e2bb70189c0 Mon Sep 17 00:00:00 2001
From: Andreas Reischuck <andreas.reischuck@hicknhack-software.com>
Date: Wed, 24 Aug 2016 17:20:24 +0200
Subject: [PATCH 1/4] operator and identifier tokenizer

---
 ...00000-operator_and_identifier_tokenizer.md | 114 ++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 text/00000-operator_and_identifier_tokenizer.md

diff --git a/text/00000-operator_and_identifier_tokenizer.md b/text/00000-operator_and_identifier_tokenizer.md
new file mode 100644
index 0000000..3c8277b
--- /dev/null
+++ b/text/00000-operator_and_identifier_tokenizer.md
@@ -0,0 +1,114 @@
+# Operator and Identifier Tokenizer
+
+- Created: 2016-08-24 by @arBmind
+
+## Summary
+[Summary]: #summary
+
+This proposal discusses the introduction and implication of custom operator identifiers.
+It aims to allow 
+
+## Motivation
+[Motivation]: #motivation
+
+To get started with parsing the language, we need to clarify how identifiers are treated.
+
+Allowing the library authors to define custom operators makes the language much more generic.
+This allows for great extensibility and innovative new operator concepts.
+
+## Detailed design
+[Detailed design]: #detailed-design
+
+The goals:
+* allow everything we can
+* allow custom operators
+
+Previous languages are very restrictive on operators and identifiers.
+Scala takes a little more generic approach. Custom operators are allowed, as long as they start with a default operator sign. And normal identifiers may be used infix with backticks.
+
+The most generic way is to allow all characters in identifiers and operators.
+This is difficult to parse, as we can only know of the expression tokens, when the scope it known.
+`a+b` may mean all kinds of things. `a + b` is probably the most expected variant.
+
+Even when allowing every character, we might want to apply some rules:
+* Opening and closing punctuations like parentheses have to match.
+* Some character sequences have fixed meaning for the language. like comments, keywords, commas etc.
+* Spaces always separate identifier sequences
+* Keywords and identifiers arguments have to be clearly separated by the tokenizer. 
+
+The tokenizer cannot separate all the identifiers.
+This task has to be delayed until the parser has figured out the scope for the block.
+Otherwise some identifiers might be missing.
+Then the longest known identifier is used, starting at the beginning.
+
+If we know `a`, `b` and `+` then `a+b` is splitted into identifiers `a`, `+` and `b`.
+If an additional identifier `a+` is introduced, the split will change to `a+` and `b`, which might lead to issues.
+This is not that bad for a binary operator, because most often we want spaces around them anyways.
+It's more difficult for unary operators like `++`, where the code may stay valid but behave different.
+
+In order to cope with this I propose to introduce a separation of character sets.
+* Regular identifiers are made of letters, numbers and underscores.
+* Operator identifiers are made of a sequence of operator characters and punctuations.
+
+Operators allow to embed regular identifiers only in punctuations.
+With this `a+b` is always parsed into three identifiers.
+All operators are still subject to splitting the longest known operator, but regular identifiers are not.
+
+There is no restrictions where you use what.
+Variables may use the operator characters, but it should be avoided except for callable variables.
+For example we want to allow a function that get's two operators as arguments.
+
+```rebuild
+&fn madd(l : $L, m : $M, a : $A, fn(:L)*(:M)->:$X, fn(:X)+(:A)->:$R) -> (r:R):
+  r = (l * m) + a
+end
+```
+
+## Drawbacks
+[Drawbacks]: #drawbacks
+
+### Compile Time
+
+To split operators at least to passes are required.
+One of the tokenizer and one to split tokens.
+
+This is an inherent issue of user defined operators, that do not require split signs.
+By splitting regular identifiers and operators the workload is reduced and most expressions do not have operators that require splitting.
+Large sequences of operators are difficult to read by programmers.
+
+### Surprises by introducing new operators
+
+When a new operator is defined the splitting of operator sequences might change, without intention of the developer.
+
+This is an inconvienience the developer has to cope with.
+I guess for existing code basis you should be very careful to introduce new operators.
+
+We might support this by allowing to import operators into a block or scope.
+For example you have a matrix library with custom operators.
+You would not want to introduce these operators everywhere, but only in a restricted scope.
+
+## Alternatives
+[Alternatives]: #alternatives
+
+### Do not use custom operators
+
+This leads to the issue that code with math is very difficult to read an maintain.
+
+All operators have to be built into the language itself.
+
+Many functional languages define a lot of operators for all kinds of function composition.
+I propose that we do not burden every user of the language to learn them all.
+But keep the option to use a library, that defines the operators the developer is willing to use.
+
+### Mark operators
+
+We might begin and end all custom operators with special characters.
+
+This leads to very awkward looking code and is not very productive.
+
+## Unresolved questions
+[Unresolved questions]: #unresolved-questions
+
+### How is the parsing of expressions handled?
+
+### How can the local import of operators be modeled?

From 108a2c4695447ef6a23a2d745051e988c04e30f7 Mon Sep 17 00:00:00 2001
From: Andreas Reischuck <andreas.reischuck@hicknhack-software.com>
Date: Wed, 24 Aug 2016 17:31:25 +0200
Subject: [PATCH 2/4] assigned pr #

---
 ...er_tokenizer.md => 00008-operator_and_identifier_tokenizer.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename text/{00000-operator_and_identifier_tokenizer.md => 00008-operator_and_identifier_tokenizer.md} (100%)

diff --git a/text/00000-operator_and_identifier_tokenizer.md b/text/00008-operator_and_identifier_tokenizer.md
similarity index 100%
rename from text/00000-operator_and_identifier_tokenizer.md
rename to text/00008-operator_and_identifier_tokenizer.md

From 3bc8193f9c5b348df418f92b14a2a8cd4077868b Mon Sep 17 00:00:00 2001
From: Andreas Reischuck <andreas.reischuck@hicknhack-software.com>
Date: Wed, 24 Aug 2016 18:26:57 +0200
Subject: [PATCH 3/4] improved the description of valid operators a bit

---
 text/00008-operator_and_identifier_tokenizer.md | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/text/00008-operator_and_identifier_tokenizer.md b/text/00008-operator_and_identifier_tokenizer.md
index 3c8277b..4adaacd 100644
--- a/text/00008-operator_and_identifier_tokenizer.md
+++ b/text/00008-operator_and_identifier_tokenizer.md
@@ -48,9 +48,19 @@ It's more difficult for unary operators like `++`, where the code may stay valid
 
 In order to cope with this I propose to introduce a separation of character sets.
 * Regular identifiers are made of letters, numbers and underscores.
-* Operator identifiers are made of a sequence of operator characters and punctuations.
+* Operator identifiers are made of symbols, emojis and punctuations that are not taken by the language itself.
+
+Open and closing punctuations always have to match. This allows to place regular identifier characters in between. `{{` is not a valid operator, as the curly braces are not closed.
+
+Some valid operator examples:
+* Unicode.MathSymbols: `+` `=` `-`
+* Unicode.OtherSymbol: `®` `⌛` `⌚`
+* Unicode.OtherNumber: `½` `²`
+* Unicode.CurrencySymbol: `¢` `¥` (`$` is reserved for the language)
+* Unicode.OtherPunctuation: `?` `!` (`#,.` are reserved for the language)
+* Unicode.OpenPunctuaton: `{dotproduct}`
+* Unicode.InitialQuotePunctuation: `«cross»`
 
-Operators allow to embed regular identifiers only in punctuations.
 With this `a+b` is always parsed into three identifiers.
 All operators are still subject to splitting the longest known operator, but regular identifiers are not.
 

From abc381b4bdef715cc2f5d5585641989446897200 Mon Sep 17 00:00:00 2001
From: Andreas Reischuck <andreas.reischuck@hicknhack-software.com>
Date: Sun, 28 May 2017 19:36:15 +0200
Subject: [PATCH 4/4] renamed file for new PR#

---
 ...er_tokenizer.md => 00019-operator_and_identifier_tokenizer.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename text/{00008-operator_and_identifier_tokenizer.md => 00019-operator_and_identifier_tokenizer.md} (100%)

diff --git a/text/00008-operator_and_identifier_tokenizer.md b/text/00019-operator_and_identifier_tokenizer.md
similarity index 100%
rename from text/00008-operator_and_identifier_tokenizer.md
rename to text/00019-operator_and_identifier_tokenizer.md