Desqui

Generate a document using remote pages.

NOTE: Docs partially completed

How it works?

To run Desqui you must create a json file with the params of the document. It will read this file, crawl the remote website to get the links based on a selector. Then it will download all the links' pages with it resources and extract the relevant data (based on selectors). With every page relevant data it will create a item using a custom template (Lodash templates). Finaly it will build the HTML document joining the items and the title.

Params

urls

The urls that will be downloaded.

baseUrl

The url used to concatenate the relative links.

urlLinks

The url where the links of the pages will be extracted.

selectors.links

The jQuery selector of the a's in the page urlLinks.

selectors.item

An object with the following structure:

The key is the name of the variable.
The value is the JQuery selector in the items' page.

All these variables can be used in templates.

documentTitle

The <title> of the document (and the default front).

headers

An object with headers that are sent during the request.

templates.documentFront

Optional. A lodash template that compiles the front of the document. Variables that will receive: documentTitle.

templates.item

A lodash template that compiles every item (aka every page). Variables that will receive: all the variables specified in selectors.item.

templates.document

Optional. A lodash template that compiles the document. Variables that will receive: documentFront, documentTitle.

Note: You need to create an element in the body with the id items in order to be able to append the items. This will change in following versions.

Usage

git clone [email protected]:tomymolina/desqui.git
cd desqui
npm install
node index /path/to/params.json

Examples

Example 1 with urls crawling

{
    "baseUrl": "https://docs.oracle.com/javase/tutorial/essential/concurrency/",
    "urlLinks": "https://docs.oracle.com/javase/tutorial/essential/concurrency/",
    "linksSelector": "#Contents a",
    "directory": "Documents/java_concurrency",
    "selectors": {
        "title": "#PageTitle",
        "content": "#PageContent"
    },
    "documentTitle": "Java Concurrency Manual",
    "documentFrontTemplate": "<h1>Java Concurrency Manual</h1>",
    "itemTemplate": "<section><header>${title}</header><div>${content}</div></section>"
}

Example 2 using urls

{
    "urls": ["http://google.com", "http://facebook.com"],
    "directory": "Documents/google_facebook",
    "selectors": {
        "title": "title",
        "content": "body"
    },
    "documentTitle": "Google and Facebook",
    "documentFrontTemplate": "<h1>Google and Facebook</h1>",
    "itemTemplate": "<section><header>${title}</header><div>${content}</div></section>"
}

That will:

Fetch https://docs.oracle.com/javase/tutorial/essential/concurrency/.
Get the href of all links with the following selector: #Contents a.
Scrape all the previous links.
Get the title (selector #PageTitle) and save it in the variable title.
Get the content (selector #PageContent) and save it in the variable content.
Parse the template for itemContent with the variables.
Repeat the process for each link.
Join all the items in one document with title <h1>Java manual</h1>

LICENSE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dist		dist
examples		examples
src		src
.gitignore		.gitignore
.jshintrc		.jshintrc
LICENSE.md		LICENSE.md
README.md		README.md
index.js		index.js
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Desqui

How it works?

Params

urls

baseUrl

urlLinks

directory

selectors.links

selectors.item

documentTitle

headers

templates.documentFront

templates.item

templates.document

Usage

Examples

Example 1 with urls crawling

Example 2 using urls

LICENSE

About

Releases

Packages

Languages

License

marmol-dev/desqui

Folders and files

Latest commit

History

Repository files navigation

Desqui

How it works?

Params

urls

baseUrl

urlLinks

directory

selectors.links

selectors.item

documentTitle

headers

templates.documentFront

templates.item

templates.document

Usage

Examples

Example 1 with urls crawling

Example 2 using urls

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages