Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: pattern mathcing and transformation on object trees #40

Open
Opiumtm opened this issue Jun 14, 2017 · 4 comments
Open

Comments

@Opiumtm
Copy link

Opiumtm commented Jun 14, 2017

I am using functional approach already in the code which analyze and transform tree-like hierarchical collections (it's transformation of source HTML DOM into Word document paragraphs or internal markup-like representation to display it later on screen).

Naturally, pattern matching is the solution for this class of problems.
There are two basic functional algorithms I use:

  1. "Subtree template" pattern matching to detect particular patterns on subtree and then do some action on it if "shape requirements" are satisfied.
  2. Recursive "tree scan" with transformation of input read-only object tree into output read-only object tree. Scanned tree is "pattern matched" (often with "subtree template" check) and appropriate action is chosen on subtree to output result transformed subtree to insert into result tree (and then passed children of root transformed subtree node again recursively pattern matched) or ignore this subtree completely if needed so.

When I have a look on Succinc<T> code it was interesting how my own implementation of this functional algorithms is very similar to Succinc<T> approach to pattern matching. So it would be good to extend Succinc<T> pattern matching for recursive object tree scan, matching and transformation.

Basic idea is the "tree scan context" which contain original subtree nodes collection from same tree level, result collection of nodes (to insert transformed tree result) and function to get children from subtree node from context.

Example code looks like

            result = html.DocumentNode.ChildNodes.TreeWalk(new ParseContext() { Nodes = result, BaseLink = baseLink })
                .GetChildren(node => node.ChildNodes) // assign default "get children func" to parse context
                .If(node => node.NodeType == typeof(IHtmlTextNode), (node, res) => AddToResult(res, new TextPostNode() { Text = WebUtility.HtmlDecode(node.InnerText) }), node => null)
                .If(IsPostLink, AddPostLink, node => null)
                .If(node => node.Name.EqualsNc("br"), (node, res) => AddToResult(res, new LineBreakPostNode()))
                .If(node => node.Name.EqualsNc("p"), (node, res) => CreateAttribute(res, PostBasicAttributes.Paragraph))
                .If(node => node.Name.EqualsNc("em"), (node, res) => CreateAttribute(res, PostBasicAttributes.Italic))
                .If(node => node.Name.EqualsNc("strong"), (node, res) => CreateAttribute(res, PostBasicAttributes.Bold))
                .If(node => GetPreformatNode(node) != null, (node, res) => CreateAttribute(res, PostBasicAttributes.Monospace), node => GetPreformatNode(node).ChildNodes)
                .If(node => CheckSpan(node, "u"), (node, res) => CreateAttribute(res, PostBasicAttributes.Underscore))
                .If(node => CheckSpan(node, "o"), (node, res) => CreateAttribute(res, PostBasicAttributes.Overscore))
                .If(node => CheckSpan(node, "spoiler"), (node, res) => CreateAttribute(res, PostBasicAttributes.Spoiler))
                .If(node => CheckSpan(node, "s"), (node, res) => CreateAttribute(res, PostBasicAttributes.Strikeout))
                .If(node => node.Name.EqualsNc("sub"), (node, res) => CreateAttribute(res, PostBasicAttributes.Sub))
                .If(node => node.Name.EqualsNc("sup"), (node, res) => CreateAttribute(res, PostBasicAttributes.Sup))
                .If(node => CheckSpan(node, "unkfunc"), (node, res) => CreateAttribute(res, PostBasicAttributes.Quote))
                .If(node => node.Name.EqualsNc("a") && !string.IsNullOrWhiteSpace(node.GetAttributeValue("href", null)), CreateLinkAttrNode)
                .Else((node, res) => res)
                .Run()
                .Nodes;

Particularly compact representation of tree matching and transformation, I think.

Tree scan context looks like this:

    /// <summary>
    /// Tree scan context.
    /// </summary>
    /// <typeparam name="T">Type of source tree node.</typeparam>
    /// <typeparam name="TApp">Type of result.</typeparam>
    public class TreeScanContext<T, TApp> : ITreeWalkContextBreak
    {
        /// <summary>
        /// Constructor.
        /// </summary>
        public TreeScanContext()
        {
            Functions = new List<TreeApplyFunc<T, TApp>>();
        }

        /// <summary>
        /// Source nodes collection.
        /// </summary>
        internal IEnumerable<T> Source { get; set; }

        /// <summary>
        /// Aggregated result for this source nodes collection.
        /// </summary>
        internal TApp Result { get; set; }

        /// <summary>
        /// Apply function.
        /// </summary>
        internal List<TreeApplyFunc<T, TApp>> Functions { get; private set; }

        /// <summary>
        /// Default apply function.
        /// </summary>
        internal Func<T, TApp, TApp> DefaultApply { get; set; }

        /// <summary>
        /// Default function to get children.
        /// </summary>
        internal Func<T, IEnumerable<T>> DefaultGetChildren { get; set; }

        /// <summary>
        /// True if tree scan should globally stop now.
        /// </summary>
        public bool IsBreak { get; set; }
    }

    /// <summary>
    /// Tree apply function description.
    /// </summary>
    /// <typeparam name="T">Type of source tree node.</typeparam>
    /// <typeparam name="TApp">Type of result.</typeparam>
    public class TreeApplyFunc<T, TApp>
    {
        /// <summary>
        /// Pattern match function.
        /// </summary>
        internal Func<T, bool> If { get; set; }

        /// <summary>
        /// Apply func. Aggregate result with passed information from original subtree node.
        /// </summary>
        internal Func<T, TApp, TApp> Apply { get; set; }

        /// <summary>
        /// Get children func. If it returns null or empty enumerable, subtree scan is stopped.
        /// </summary>
        internal Func<T, IEnumerable<T>> GetChildren { get; set; }

        /// <summary>
        /// True if "If" pattern matching function should be execute after
        /// all other matches and only if no other pattern apply.
        /// </summary>
        internal bool IsElse { get; set; }
    }

Code above is provided as the idea. I don't feel it's perfect. I'm just reinvented it to solve particular and quite common problem of tree scan and transformation using functional approach and recursive pattern matching. It would be good if Succinc<T> do support similar feature, but more systematically and consistently implemented.

@DavidArno
Copy link
Owner

I haven't documented it very well yet, but I think #36 might go at least some way to addressing this. Do you concur, or am I misreading this request?

@Opiumtm
Copy link
Author

Opiumtm commented Jun 14, 2017

@DavidArno it isn't recursive pattern matching. Am I wrong?

@Opiumtm
Copy link
Author

Opiumtm commented Jun 15, 2017

@DavidArno I may fork Succinc<T> and then move some of my "recursive pattern matching" logic to library via pull request. You can modify it to make it more pretty later. It's ok?
That code already is successfully used in production in some of my projects. Just to mention, it's used in Html-to-Microsoft Word component in custom-made human resources management software used at my office.

@DavidArno
Copy link
Owner

DavidArno commented Jun 16, 2017

@Opiumtm,

That sounds great. I can't promise which bits I'll include, but real code will help me understand how your proposal might fit in with what I'm already doing, whether yours is a better, or just a different suggestion etc. So "pull request" away 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants