Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Scrape README from GitHub if the source is provided #292

Closed
wants to merge 3 commits into from

Conversation

phase
Copy link
Contributor

@phase phase commented Oct 31, 2017

If a user provides a source repository on GitHub for the project, the
contents of the Home page will be set to the contents of the README.

Closes #87

If a user provides a source repository on GitHub for the project, the
contents of the Home page will be set to the contents of the README.

Closes SpongePowered#87

Signed-off-by: Jadon Fowler <[email protected]>

private val identifier = "A-Za-z0-9-_"
private val gitHubUrlPattern = s"""http(s)?://github.com/[$identifier]+/[$identifier]+(/)?""".r.pattern
private val readmeUrl = "https://raw.githubusercontent.com/%s/%s/master/README.md"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this with varying capitalisation?

  • readme.md
  • README.MD
  • readme.MD

Additionally, GitHub supports more than just .md as a file extension for markdown: https://github.com/github/markup/blob/master/lib/github/markup/markdown.rb#L32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently only supports README.md. I'll update it with different types soon.

@@ -446,7 +447,18 @@ case class Project(override val id: Option[Int] = None,
* @return Project home page
*/
def homePage: Page = Defined {
val page = new Page(this.id.get, Page.HomeName, Page.Template(this.name, Page.HomeMessage), false, -1)
var body = Page.HomeMessage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only do it if one is not manually provided, yes? If that is not the behaviour with this PR, it should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you're saying. This scrapes the README if a source URL is provided, and if it is from GitHub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking if this will overwrite an existing manually-created README.

Copy link
Contributor Author

@phase phase Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a Project is created, it creates a Home Page that has Page.HomeMessage in it. Lines 448 - 451 retrieve the Page from database, passing the page variable as a default. This changes the default to the README it scraped from GitHub. It's basically replacing the text from Page.HomeMessage with the README. So no, it will not replace the Home Page after it has been edited.

@windy1
Copy link
Contributor

windy1 commented Oct 31, 2017

Can you please make this optional with a setting please? (and during project creation)

@phase
Copy link
Contributor Author

phase commented Oct 31, 2017

Is there a use case for it being optional? If you're linking your project to GitHub, I can't think of any reason you wouldn't want your README copied. And even if they don't want it, they can easily delete it and replace it with a different home page if they so desire.

@phase phase changed the title Scrape README from GitHub if the source is provided WIP: Scrape README from GitHub if the source is provided Dec 30, 2017
@progwml6
Copy link
Member

progwml6 commented Jan 4, 2018

@phase if we want to do this we need both a setting for the project to enable/disable github README syncing, and to consider using commonmark for the markdown rendering which appears to be supported by the current library

@parlough
Copy link

parlough commented Jan 4, 2018

We should be using CommonMark anyway as I believe Discourse has switched to it in its latest few releases.

@progwml6
Copy link
Member

progwml6 commented Jan 4, 2018

if discourse is using it move everything to commonmark based rendering for markdown on ore

@ryantheleach ryantheleach added this to the v1.6.0 milestone Jan 5, 2018
@progwml6 progwml6 modified the milestones: v1.6.0, v1.5.0 Jan 5, 2018
@simon816
Copy link
Contributor

Ore is already using the flexmark library which implements CommonMark, so nothing needs to be done there.

if we want to do this we need both a setting for the project to enable/disable github README syncing

This PR is just scraping the README on project creation, github syncing can come later (IMO)

@progwml6
Copy link
Member

commonmark is one of the rendering options, and not what we are using iirc

@ryantheleach
Copy link
Contributor

ryantheleach commented Jan 16, 2018

Some issues from the old OCD list that I think are relevant to review on this PR:

  • By default syntax highlighting is enabled for code blocks, IMO it should be disabled in favor of using the syntax of Github Flavored Markdown ```Java for example.
  • Inserting images via HTML is not supported, not sure if this is something that would be supported but I thought I'd mention it.

If they arn't covered by this PR / CommonMark please recreate them or make an explicit comment declining to implement them, as I don't believe they belonged on a minor issue list.

@progwml6
Copy link
Member

as far as i'm concerned this PR is for GH readme scraping and any commonmark fixes that we deem necessary while we are at it

@progwml6
Copy link
Member

@phase this pr has conflicts

phase added 2 commits March 13, 2018 23:01
This updates a *ton*. The main conflict was some import crap but other
then that the code in this branch (ripping GitHub READMEs) doesn't
conflict with anything.

Signed-off-by: Jadon Fowler <[email protected]>
GitHub has a lovely API for retrieving a repository's README.

Signed-off-by: Jadon Fowler <[email protected]>
@phase phase self-assigned this Mar 14, 2018
@phase
Copy link
Contributor Author

phase commented Mar 14, 2018

Conflicts have been fixed & it will now get any README using GitHub's API.

@phase phase removed their assignment Mar 19, 2018
@phase phase modified the milestones: v1.6.0, v1.8.0 Jul 16, 2018
@Katrix Katrix mentioned this pull request Jan 13, 2019
@Katrix
Copy link
Member

Katrix commented Jan 13, 2019

See #729

@Katrix Katrix closed this Jan 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants