Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of repo title? #41

Open
finanalyst opened this issue Jul 11, 2024 · 20 comments
Open

Change of repo title? #41

finanalyst opened this issue Jul 11, 2024 · 20 comments

Comments

@finanalyst
Copy link
Contributor

@thoughtstream @lizmat
I've tidied the repo structure and re-written the README.
Would you please review the README.
The original is in C<.rakudoc> format and is rendered to C<.md> by RakuAST::RakuDoc::Render.

I am working on a github workflow and renderer container so that any .rakudoc source in the root is automatically rendered to <.md> if changed.

@lizmat I was wondering whether the repo should be renamed to RakuDocV2 instead of ...-GAMMA

@zag, would you like to add PodLite to the README as a Renderer based on Javascript? Even if there are some departures from RakuDoc v2 in Podlite, I think it is worth adding, so long as the principle differences are mentioned.

@zag
Copy link
Member

zag commented Jul 11, 2024

@finanalyst, Podlite is not a renderer for RakuDocV2. While there is some compatibility with the proposed version, it is very limited. i think, It would be better not to mention Podlite in this context at all.
thank you

@thoughtstream
Copy link

I like the new directory structure.

I noted three issues:

  1. In README.md, there is a sentence with a link:
    A renderer is considered compliant if it can render into a chosen output format the file Rakudociem-ipsum.rakudoc
    That link doesn't actually link to compliance-files/rakudociem-ipsum.rakudoc;
    it links to the non-existent file compliance-files/rakudociem-ipsum.md.
    I found that doubly confusing.

  2. The compliance files bootiful-disclaimer.rakudoc and bootiful-disclaimer.txt are inconsistent.
    I can't see how bootiful-disclaimer.txt could possibly be rendered from bootiful-disclaimer.rakudoc.

  3. The compliance files bootiful-disclaimer.rakudoc and bootiful-disclaimer.html are not quite consistent.
    The phrase from bootiful-disclaimer.rakudoc: so don't even B<I<think>> about suing us
    is rendered in bootiful-disclaimer.html as: so don't even <B>think</B> about suing us
    I believe it should be rendered as: so don't even <B><I>think</I></B> about suing us

Otherwise, it looks good.

@finanalyst
Copy link
Contributor Author

@thoughtstream These are all things I hadn't considered/thought about. Taking them in the numbering above
Item 1: This uncovers an ambiguity in RakuDoc! Sorry for the length of this analysis.

  • The source has something like L<test-file|path/test-file>. No file format extension.
  • The implication is that if the source is being rendered into .md then the link should be to path/test-file.md, but if it is being rendered to .html then it is to path/test-file.html.
  • That is what happened here.
  • But what is required is for the link to be written L<test-file|path/test-file.rakudoc> and for the link to be unchanged. (Currently, my software is adding the default format to all output files, though this is not entirely the correct thing to do because modern HTML servers will add .html to any route that needs it, so path/test-file is in fact preferred.
  • There seem to be three reasonable assumptions as to output
  1. link to file without extension -> link to file with output extension, eg test-file -> test-file.md if output is .md
  2. link to file without extension -> link to file without extension, eg test-file -> test-file whatever is the output (but this will create problems with outputs that require a file format extension
  3. link to file with extension -> link to file without a change of extension.

So, I think that option 3 should be mandated in the specification, namely if a file extension is explicit in a link, it should be left unchanged, but if no extension is explicit, the renderer is free to add or not add a file extension.

As to options 1 & 2, my renderer can handle this by having two different renderers, one that explicitly adds the file format and one that does not.

Items 2 & 3:
My intention was to provide short files with different file format extensions so that the renderer can download the texts and deal with them appropriately, not to provide the same text with different outputs.

I think that if I name each file distinctly, there should be no implication they are the same.

@coke
Copy link

coke commented Jul 13, 2024 via email

@thoughtstream
Copy link

  • There seem to be three reasonable assumptions as to output

    1. link to file without extension -> link to file with output extension, eg test-file -> test-file.md if output is .md

    2. link to file without extension -> link to file without extension, eg test-file -> test-file whatever is the output (but this will create problems with outputs that require a file format extension

    3. link to file with extension -> link to file without a change of extension.

I understand what you’re trying to achieve and it’s certainly a useful capability
(as @coke has already attested).

However, I’m concerned that the proposed approach may be a little ETOOCLEVER.
My expectation would have been that – by default – the URL given in a link or
placement link would not be adjusted in any way, regardless of its extension,
or lack thereof.

I can see that it would be useful for renderers to provide the option
to have URLs massaged into the appropriate target format, but I worry that
having them add extensions is not the safest approach.

If I specify a link to an existing file or web target that explicitly
doesn’t have (and shouldn’t have) an extension:

    This behaviour is summarized in the L<README file | file:docs/README>.
    For full details, see the L<online documentation | https://omnicorp/docs>.

...then I definitely don’t want those URLs “helpfully” converted to
file:docs/README.md or https://omnicorp/docs.md.

Given that I may sometimes, but not always, want extensionless URLs
to have extensions added, now we have to come up with a way of indicating
(or inferring/guessing) whether an extensionless URL is one that
should have an extension added...or not. That seems suboptimal,
and a recipe for broken expectations (in both directions).

It seems to me that a more predictable and less edge-cased approach
might be to specify that – by default – all URLs are rendered without
modification, but that renderers are optionally allowed to detect URLS
with a .rakudoc extension, and convert (only) those URLS to
the same extension as the target rendering format.

In other words, when authors write outgoing links with .rakudoc extensions,
and then render the document to MarkDown or HTML, those outgoing links
can get converted to the corresponding .md or .html extension. And any other
link, regardless of whether or not it has an extension, is rendered unchanged.

That would still be an imperfect solution, of course. For example, you might
want to refer back to the source document itself:

    [This manpage was generated from L<a RakuDoc source | file:sources/docs/manpage.rakudoc>]

...without having its extension converted.

So maybe there isn’t a way to achieve both “least surprise” and “do what I mean”
with an automatic extension-change mechanism.

Which seems to imply that the mechanism can’t be automatic.
That whatever approach is provided must be explicitly requested from the renderer.

So I think that, if we specify anything at all, it should be
that – by default – URLs are rendered unchanged, but that
renderers are free to offer smarter options that allow collections
of documents to be rendered consistently, either by adding extensions
to extensionless URLs, or by changing .rakudoc extensions to something
consistent with the target rendering format, or by some other approach.
For example, renderers could specify a “magical” pseudo-extension,
such as .??? or .ZZZ or .🧞, that is autoconverted to
the appropriate extension for the target output format.

Items 2 & 3: My intention was to provide short files with different file format extensions so
that the renderer can download the texts and deal with them appropriately, not to provide the
same text with different outputs.

Ah. I had misunderstood the purpose of those various nearly identically named files.

I think that if I name each file distinctly, there should be no implication they are the same.

Agreed.

@lizmat
Copy link

lizmat commented Jul 14, 2024 via email

@finanalyst
Copy link
Contributor Author

@thoughtstream @lizmat Another wrinkle to be considered. L<local file|local-file> is a link to a file in the same location, such as the RakuDoc sources in Raku/doc/docs.
For a website, these will be .html but for an ebook, .xhtml, and I tend to document my work with a RakuDoc source for each of the modules in a distribution, and I render them all into the root directory., so I want L<local file|local-file> to point to local-file.md

FYI, I distinguish between internal, local and external links. Internal is to the output of the file itself, eg a link to a heading, local - I described above.

I absolutely agree that an external link, which starts with another schema, such as https;// should not change the file format. And the renderer did, so it is wrong.

If we define .* for local files, it will mean quite a wholescale renaming for Raku/doc/docs sources. We need to be sure that there is good reason to do this.

@thoughtstream
Copy link

I agree with @lizmat that .* would be a vastly better choice for
the “hey-change-this-to-whatever-you-want” pseudo-extension.

However, I also agree with @finanalyst that changing the behaviour
might be a huge backwards compatibility issue, especially for
the Raku documentation itself.

In a perfect world, I would advocate that we bite the bullet and
specify a special .* suffix as the only way to get autotranslation
of file extensions. That would offer the maximum flexibility and
self-documentation in the feature.

But our mandate to be backwards compatible is an important one.
So I suggest that we will have to give up that improvement
and specify something along the lines @finanalyst has already suggested.
Such as:

Any URL that includes a final file extension (matching / '.' <alnum>+ $ /),
or which starts with any scheme identifier except file:, must always
be rendered “as-is”, with no change to any final file extension.

When a renderer encounters a URL that does start with file:
and which does not end in a file extension, then the renderer
may optionally add a suitable final file extension to the URL.

Here “optionally” means that the renderer may either provide an explicit mechanism
to allow the user to request that suitable file extensions be automatically added,
or else the renderer may add the extensions automatically by default, but must also
provide an explicit mechanism to allow the user to prevent such file extensions
being automatically added.

The remaining question is whether to also specify the .* pseudo-extension
as one of the optional mechanisms to allow autotranslation of extensions.
It’s a good approach, but it would add complexity. And if it isn’t going to be
our single mandated approach, do we really want to “multiply entities without necessity”?

I suspect not, but would like to hear other opinions.

@lizmat
Copy link

lizmat commented Jul 18, 2024 via email

@thoughtstream
Copy link

I think for the Raku documentation itself, this could almost be handled programmatically once.

I had wondered about that. But since I wasn't going to volunteer to do it,
I was extremely reluctant to suggest it. :-)

If the backwards compatibility issue isn't actually an issue, then I would seriously reconsider
whether we should bite the bullet and mandate this change to a much cleaner and more powerful mechanism.

Specifically, whether we should specify:


Renderers must always preserve verbatim every URL associated with an outgoing link.
Every such URL must always be rendered unchanged, except if the URL
ends in the pseudo-extension .* (i.e. if the URL matches / '.' '*' $ /.

In that one case, the .* must always be replaced by the extension corresponding to
the target format into which the RakuDoc document is being rendered.

For example, if a link is specified L<like so | file:docs/README.*>, then when the
containing file is being rendered to Markdown, the link extension would be converted to:

    For example, if a link is specified [like so](file:docs/README.md)

...and when the containing file is being rendered to HTML:

    For example, if a link is specified <a href="file:docs/README.html">like so<\a>

If a RakuDoc document is being rendered into a format for which there is no corresponding file extension,
then every .* pseudo-extension must instead be deleted. For example, when the containing file
is rendered to plaintext:

    For example, if a link is specified like so (file:docs/README)


Given that @lizmat appears to have volunteered to fix any effects on the Raku documentation set 😉
what do you now think of that approach, @finanalyst?

My own view is that it is potentially a much more explicit, cleaner, more predictable, more consistent,
and more flexible approach than merely specifying that renderers may choose to provide some
mechanism to optionally detect implicit requests to have missing extensions universally added.

But I definitely want to hear other opinions before we decide.

And before we do decide, I certainly want to know that someone – and not necessarily @lizmathas actually
volunteered to create the code needed to reprocess the entire Raku documentation set to add these new
automagic "whatever extensions" (because, despite my little joke above, I'm well aware that @lizmat hasn't
volunteered to do that!)

@coke
Copy link

coke commented Jul 18, 2024

If it's raku/doc, I volunteer for any cleanup there.

@thoughtstream
Copy link

Thanks for volunteering, @coke. Much appreciated!

Now we just need to decide whether we actually want to bite that particular bullet.
Obviously, I'm in favour, but I'd really like to hear what everyone else thinks...especially @finanalyst.

@finanalyst
Copy link
Contributor Author

@thoughtstream @lizmat @coke Sorry for not responding quickly on this issue. I also wanted to see what @coke would say.

  • It seems to me that reducing ambiguity and at the same time giving authors flexibility is a good thing, so I am in favour of stating that .* should be used for a collection of documents which will be rendered together. But I retain doubts about the need to mandate it.
  • A previous response of mine has caused misunderstanding. I said 'files'. I should have said something like resource as in URL (unified resource location). So I should not have implied the file: schema.
  • I'll list the cases I think need to be covered when changing the docs.

To avoid being ambiguous again and to clarify my own thinking, allow me to restate things we all know.RakuDoc has two (three?) components that refer to other resources

  1. L<DISPLAY | SCHEMA:URI>
    • In Raku/doc/docs and because the original Pod::To::HTML did not implement anything else, the only schemas allowed were Null, file and http[s]
    • RakuDoc allows for several other schemas, such as rakudoc, man, USDN.
    • The schema file is not particularly useful IMHO for L<> as it is for P<> because it requires the resource to be in a filesystem, and a link to a file system is ambiguous.
    • when the SCHEMA is not Null, eg https, it is not a part of the collection of source files begin rendered. I call these external links (see below for internal and local links).
    • RakuDoc offers three forms for URI, covered below.
    • L<> markup is intended to allow for user interaction and requires some action, like a mouse click, to move the user to that resource.
  2. P<FALLBACK DISPLAY | SCHEMA:URI> & =place SCHEMA:URI :fallback<DISPLAY> ...
    • P<> markup (since =place is just a block form of P<> I'll not discuss it separately) differs from L<> markup in that no user action is required, and the resource is embedded directly into the output.
    • Since some of the discussion below concerns schemas and URI's in the context of L<> markup, and these are used in P<> markup, it may have some implication for P<>. However, I cannot think of any at this point.

RakuDoc allows for three URI in a L<> markup.

  1. #, eg L<DISPLAY | #Some heading> - I call this an internal link because it offers a jump to another section of the resource generated by the RakuDoc source.
  2. path/filename, eg. L<DISPLAY | type/IO/Path> which may be abbreviated to L<type/IO/Path>, in which case type/IO/Path serves as both the DISPLAY and URI.
  • this refers to a resource that is a part of the same collection of resources that is rendered together with the resource generated from the source currently being rendered
  • I call this a local link because it offers a jump to another resource, but is still associated with the current one.
  • this is the prime area of ambiguity.
  1. path/filename#heading, eg. L<get a directory listing | type/IO/Path#routine_dir>
  • this is also a local link, but it also has an internal anchor.

Rendering implementation (I'll ignore text output because links are not possible)

  • internal links are rendered according to the rules of the output format, eg.
    • html - an <a href="#anchor_id"> tab
    • md - [heading](#mangled_heading)
    • xhtml- (for epub ebooks) as html
  • external, no change to the URI
  • local links have to take into account the output form of the rendered source, and the environment of the output. , For illustration, the RakuDoc source is L<Directory|type/IO/Path#routine_dir>
    • html if the environment is server-oriented, AND the server is a modern route-based server, then the link is rendered as <a href="type/IO/Path#routine_dir">Directory</a>
    • html if the environment is an older server type, or for use in browsing from a file sytem, the it is <a href="type/IO/Path.html#routine_dir">...
    • md only the form [Directory](type/IO/Path.md#routine_dir) is possible
    • xhtml for epub, we must have <a href="/asset-directory/type_IO_Path.xhtml#routine_dir>Directory because the epub format has a restricted filesystem and it doesn't like subdirectories.

The natural assumption is that for a collection of documents represented RakuDoc sources, they will all be rendered into the same output format, and so a renderer may apply the same naming rules to all L<> URIs that are determined to be local.

The ambiguity arises because for local links, there is an occasional need to refer to a local resource that has a fixed output, eg L<See distribution README | README.md>, with the assumption that README.md has not been generated from README.rakudoc.

Mandating links of the form L<directory listing | type/IO/Path.*#routine_dir> which could be abbreviated to L<type/IO/Path.*#routine_dir> instead of the current L<type/IO/Path#routine_dir> would mean extra characters in most cases.

After writing out this analysis, I am undecided about the benefits of requiring the extra .* to handle rare exceptions.

Perhaps, we might say that .* is the correct way to disambiguate from an explicit file-format, but that a resource name without a file format is the abbreviation of .*? This was the essence of my proposal in a previous email.

The changes to the Raku/docs sources would need to look at the following forms (ignoring ws around |)

  • L<path/filename> -> L<path/filename.*>
  • L<display text | path/filename> -> L<display text | path/filename.*>
  • L<path/filename#heading> -> L<path/filename.*#heading>
  • L<display text | path/filename#heading> -> L<display text | path/filename.*#heading>

@thoughtstream
Copy link

thoughtstream commented Jul 21, 2024 via email

@thoughtstream
Copy link

I very much appreciate @finanalyst’s thorough analysis,
review, and summation of the issues here.

However, I’m not entirely sure I agree with every aspect of it.
I see two main points that need further consideration.

The first is that, while the external/local/internal distinction is useful,
I’m not sure it’s as clear-cut as it first seems, when it comes to automating
the addition of file extensions.

Of course, I agree that the issue seems completely irrelevant to internal URLs,
and entirely elevant to local URLs. But I also can imagine situations
where it might be relevant to external URLs as well.

For example, suppose I am writing an e-book that I expect to render into all of
.epub and .md and .pdf formats. If that book contains external links to other
online resources (help files, real-time data or charts, post-publication updates),
it is at least conceivable that I might want to provide those resources in all
three formats as well. If for no other reason than that a user clicking on an
external link would probably want whatever reader they’re using to also be able
to render that external resource. So, despite originating from outside the local
environment, it would still need to be in the same format.

Which suggests to me that not just local URLs, but any non-internal URL
could potentially need this kind of auto-mangling of file extensions.

And I'm not even certain we can categorically exclude internal URLs either.
After several days of trying, I haven't been able to think of a plausible scenario
where an internal link might need extension-mangling, but I'm not (quite ;-)
egotistical enough to believe that no-one is clever enough to ever
come up with such a usage, just because I'm not.

The second point I think we need to consider is that I disagree with the statement:
“The ambiguity arises because for local links, there is an occasional need
to refer to a local resource that has a fixed output”.

In my view, the ambiguity arises because the auto-extension mechanism is implicit.
That is: because auto-extensions are requested by not specifying anything at all,
rather than by specifying an explicit syntax that means “auto-add an extension here”.
It’s ambiguous because occasionally (and maybe even only very occasionally)
leaving off the extension means “this URL definitely has no extension; don’t add one”.

Which is why I believe we need a general rule:

    No URL ever has an extension added to it automatically.

Plus a specific exception syntax:

    This explicit extension (.*) is always replaced with the current target extension.

Then it doesn’t matter whether we can think of realistic cases where an external link
(or even an internal link) also needs an extension magically added to it.
Because, if the .* mechanism is universal, then when someone discovers such a realistic case,
then that means to (explicitly) specify that magic extension is already in place.

In addition to future-proofing the language for the needs of people who are cleverer than us
(I know, it’s hard to imagine, but there may one day be such a person ;-), the above two
rules are extremely simple to explain, easy to implement, and unsurprising to use.

And that’s why I’m strongly in favour of specifying this approach.

The reason I’m equally strongly in favour of mandating this approach is that making it
a requirement of any renderer will greatly increase the portability of RakuDoc documents
across a wider range of target formats (both existing formats and those yet-to-be-invented).

And as @coke has generously volunteered to fix the largest source of breakage if we mandate
such a change (and will presumably make the software they create available for others to
fix any other such breakages as well), then I don’t see any significant downside to making
this approach the official one.

But, as always, I’m happy to continue this discussion or to defer the decision as long as
we all feel is necessary. Because, as always, we're aiming for “get it right”, not merely “get it done”.

@finanalyst
Copy link
Contributor Author

@thoughtstream @lizmat @coke Seems reasonable to me. I'll prepare some clarification text and raise a PR.

@finanalyst
Copy link
Contributor Author

finanalyst commented Jul 25, 2024

@thoughtstream trying to think how to apply this approach practically to internal links.

Currently we have L<DISPLAY | #some heading> Even if someone might want an internal link to the same document in another format, how would this look? I think its impossible by definition to do so. If a source has two forms, eg., .md and .html, they are different, and so local to each other. An internal link is to the same document.

finanalyst added a commit that referenced this issue Jul 25, 2024
@thoughtstream
Copy link

Remember that the .* substitution is a textual one. So we might choose to specify
that a .* appearing anywhere in a link (i.e. not just at the end of a URL) is always replaced
by the extension of the renderer’s current target format.

Then you could generate a document with internal links that adapt to the actual output format.
For example, you might create a document with a structure like so:

=head1 Installing help files

=head2 Installing .md help files

=head2 Installing .html help files

=head2 Installing .pdf help files

and then elsewhere:

Then simply L<install this file | #Installing .* help files>
and help will be immediately available to all your users.

Note that I’m not saying that’s a particular good idea, just that it’s possible
to imagine realistic uses for .* even within internal links.

@finanalyst
Copy link
Contributor Author

So long as we are confining the .* to the meta section of a L< | META>.

@thoughtstream
Copy link

I am strongly in agreement that magical .* replacement should only happen in the meta section of a link.
More specifically still: it should only happen when the .* is part of the URL
(i.e. if we later add other metadata to links, that metadata wouldn't undergo .* substitution.

finanalyst added a commit that referenced this issue Jul 27, 2024
* psuedo fileformat, issue #41

* add in review changes

* add in review changes from patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants