Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming of template values #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FaFre
Copy link
Contributor

@FaFre FaFre commented Nov 28, 2020

Depending on how template parameters have been typed, there may are trailing white-spaces and/or line-breaks.

The wiki editor automatically gets rid of them.

@FaFre
Copy link
Contributor Author

FaFre commented Nov 28, 2020

The same behavior could be also implemented for the keys. They are also stored as Wikitext (is that really necessary? maybe just a string would be enough?) and get trimmed with NormalizeTemplateArgumentName() and converted to a string during every enumeration.

However, this would be something independent from this here and may requires further discussion.

@CXuesong
Copy link
Owner

CXuesong commented Nov 29, 2020

I'd like to keep the parsed wikitext intact. That is, if you call parser.Parse(myWikitext).ToString(), ideally you should get the same value as myWikitext. I use MwParserFromScratch to help making some bot edits. I don't want something like this

{{Infobox book
| name          = So Long, and Thanks for All the Fish
| title_orig    =
| translator    =
| image         = SoLongAndThanksForAllTheFish.jpg
| caption       = First Edition (UK) with [[Lenticular printing|lenticular print]] of a [[plesiosaurus]] and [[walrus]]
| author        = [[Douglas Adams]]
| illustrator   =
...
}}

turns into this when I submit my change

{{Infobox book|name=So Long, and Thanks for All the Fish|title_orig=|translator=|image=SoLongAndThanksForAllTheFish.jpg| caption=First Edition (UK) with [[Lenticular printing|lenticular print]] of a [[plesiosaurus]] and [[walrus]]|author=[[Douglas Adams]]|illustrator=
...}}

And when writing this library, I think it's acceptable even to trade performance for keeping the input wikitext intact, as when you are working with online bots, most of the time taken is actually on the network.

In case you really want to frequently access the template parameters (like when you are working with some offline wikitext dump), you can do the caching (with Dictionary) by yourself. I won't do that for you since I believe you know better what you need than I do. (But we may consider adding some API were there really some general & frequent usage pattern like this.)

That being said, I have introduced some helper functions for AST consumer's convenience. I think this one should satisfy your need:

/// <inheritdoc cref="NormalizeTemplateArgumentName(string)"/>
/// <param name="argumentName">The argument name to be normalized. The node will be converted into its string representation.</param>
public static string NormalizeTemplateArgumentName(Node argumentName)
{
if (argumentName == null) return null;
return NormalizeTemplateArgumentName(argumentName.ToString());
}
/// <summary>
/// Normalizes a template argument name.
/// </summary>
/// <param name="argumentName">The argument name to be normalized.</param>
/// <returns>The normalized argument name, with leading and trailing whitespace removed,
/// or <c>null</c> if <paramref name="argumentName"/> is <c>null</c>.</returns>
public static string NormalizeTemplateArgumentName(string argumentName)
{
if (string.IsNullOrEmpty(argumentName)) return argumentName;
return argumentName.Trim();
}

This function is also used by TemplateArgumentCollection, so you can access template arguments with something like template.Arguments["isbn"], regardless of whether there is leading or trailing whitespace.

You may find the implementation of TemplateArgumentCollection not very performant. I've explained that at the beginning.

The same behavior could be also implemented for the keys. They are also stored as Wikitext (is that really necessary? maybe just a string would be enough?)

You forgot there could be templates, comments, or arguments 😋

{{Foo | param_{{Bar}} <!-- predefined parameter index --> = {{{param_value}}} }}

@FaFre
Copy link
Contributor Author

FaFre commented Dec 1, 2020

Okay, yes that makes a lot of sense to me. I should read some more about the specs before trying to make contributions :P

Maybe it makes sense to leave it in the Utils tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants