Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the parser #62

Draft
wants to merge 225 commits into
base: master
Choose a base branch
from
Draft

Conversation

PhilippImhof
Copy link
Collaborator

@PhilippImhof PhilippImhof commented Nov 30, 2022

The current parser goes back to the very first version. It is entirely based on regular expressions. However, these can only partially parse languages. This has several consequences:

  • The current code is rather complicated and difficult to maintain, let alone to extend, especially for possible future contributors.
  • Currently, we have the following limitations:
    • Strings must be delimited by double quotes only. They cannot contain line breaks and they cannot contain a double quote, because escaping is not supported.
    • Lists can only contain one single data type, e.g. all numbers or all strings.
    • Lists cannot be nested and thus we cannot have multi-dimensional arrays, e.g. for matrices.
  • There is only one "namespace". The Formulas Question plugin only defines functions (even for constants like pi) and users can only define variables. However, variable names must be different from function names. As a consequence, every new function we define could potentially break existing questions. The validation makes sure one cannot define a variable sin. But let's say a Spanish speaking user defines a variable sen (to store some sine value). This will work, because there is no such function at the moment. Now, if we add the function sen in a month, the question will stop working because of the conflicting name.
  • The current code uses eval() which is considered bad practice, even if it's done very cautiously.

The aim of this PR is to completely rewrite the existing parser. In the end, we should be able to

  • have strings delimited by single or double quotes ✓
  • have line breaks or multi-line strings ✓
  • include the delimiting quote in a string thanks to escaping ✓
  • access chars from a string like we do with array elements ✓
  • access array elements (and chars from a string) "from the end" with negative indices, as we can do for strings in PHP ✓
  • have lists with mixed data types and nested lists, thus allowing to develop matrices ✓
  • broader usage of ranges, e.g. use variables in ranges ✓ or use ranges and elements side-by-side in an array (this is already possible in sets) ✓
  • allow the user to "overwrite" identifiers, i.e. to have a variable name that is already used for a function, without losing the ability to call that function ✓
  • avoid the usage of eval()
  • use pi or π instead of pi(); the old syntax will remain valid for backwards compatibility ✓
  • use strings in a ternary expression, e.g. (a == b ? "yes" : "no") as an easier and more readable alternative to pick()
  • use + to concatenate (join) strings as a short alternative to join()
  • use == comparison for strings ✓

Also, the new parser should make it easier to locate possible syntax errors in a definition, e.g. better indicate the row and column number.

@PhilippImhof
Copy link
Collaborator Author

PhilippImhof commented Feb 26, 2023

Finally found some time to take this a step further. Tokenizer is fully functional. It is still work in progress, many things are missing and implementations are likely to change.

Next steps:

  • function calls (like sin() etc.) inside expressions
  • ternary operator
  • arrays
  • for loop

@PhilippImhof PhilippImhof force-pushed the parser branch 4 times, most recently from de0df5f to 0902aa3 Compare March 6, 2023 10:53
@PhilippImhof
Copy link
Collaborator Author

PhilippImhof commented Mar 26, 2023

Parser is now more or less finished. There's some refactoring and some fine tuning left and there will probably be a few bugs.

Next step: evaluation. Currently, there is virtually no evaluation in the code, because everything is sent to eval(). Once the rewrite is done, evaluation will be handled inside the plugin and we can finally get rid of eval().

@PhilippImhof
Copy link
Collaborator Author

Update: Evaluation of expressions is basically working, with a few things left to be done, namely all the variable stuff (replace variable by its content at the right time and assign new values to variables). Also, evaluation (not parsing!) of for loops is not done yet. These are the next steps.

@PhilippImhof
Copy link
Collaborator Author

PhilippImhof commented Apr 15, 2023

Variables are now basically working. What remains to be done:

  • evaluation of for loops
  • assignment of a value to an individual array element
  • definition and instantiation of random variables

@PhilippImhof
Copy link
Collaborator Author

PhilippImhof commented Apr 22, 2023

Almost all functions are now available and working. Here's what remains to be done (implemented/written or tested):

  • inv() function
  • map() function
  • parsing and instantiation of random variables (started)
  • parsing of answers (almost ready)
  • dealing with grading variables
  • grading of answer type number
  • grading of answer type numeric
  • grading of answer type numerical formula
  • dealing with algebraic variables
  • grading of answer type algebraic formula

@PhilippImhof
Copy link
Collaborator Author

I have started the integration of the new parser. That's going to take a while. Please note:

From now on, some tests will be failing

Some tests depend on legacy code that might have been removed or changed. These tests will have to be updated once the integration is done. Until then, they will fail.

@PhilippImhof PhilippImhof added this to the 6.0.0 milestone Aug 15, 2023
@PhilippImhof
Copy link
Collaborator Author

It is now possible for teachers to "deactivate" built-in functions for the student's response by overwriting them as variables. As the teacher can still access those overwritten functions by using the PREFIX operator, they can use them in their model answers, but students cannot.

Example: Question asking the student to give an equivalent of (sin(x))^2, the student could simply type the same formula and would get full marks unless the teacher spots this by looking at the answers. By defining the global "variable" sin = 9, the student's answer (sin(x))^2 would be read as (9*x)^2 and would thus be wrong. The student has to write 1 - (cos(x))^2.

For the docs: Teachers SHOULD NOT use the prefix operator in the model answer for an algebraic formula, because the model answer is used for the "The correct answer is …" feedback and when the student clicks "Fill in correct responses". Using the prefix in the model answer will then show an answer that the student cannot successfully submit. This is not a problem for the other answer types, because the result will be numerically evaluated.

In the example above, the teacher should thus not write (\sin(x))^2 in the model answer, but 1 - (cos(x))^2, as they expect from the student.

@PhilippImhof
Copy link
Collaborator Author

Thinking of it again, I will make sure that using the PREFIX in an algebraic formula's model answer will give an appropriate error message, I think it is not good to have the docs say that one should not do something, but allow them to do it anyway…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement refactoring refactoring of existing code
Projects
None yet
2 participants