This app scrapes the Jerusalem Post website (www.jpost.com/breaking-news) for breaking news headlines.
The headline, link, reporter, and date of the report are captured, stored, and rendered to the app's home page. Here is how a headline is displayed in the Web-Scraper app.
Articles can be marked as 'saved' by clicking on the SAVE ARTICLE button.
Clicking on the headline itself will load the linked article in another web tab, as displayed below.
The Home page navbar has links to the Home page and Saved articles.
Click on the Saved Articles link to view the list of saved articles. Saved articles have two buttons for either removing it (DELETE FROM SAVED), or adding notes to it (ARTICLE NOTES).
Here is the Notes (modal) bootbox. Notes can be saved or removed from the list.
The Saved Articles navbar has a link to return to the Home Page, as well as a CLEAR ARTICLES button. In this version of the app, this button removes the list of headlines from the webpage without deleting them from the database.
The dependencies for this nodejs app are:
- axios
- bootbox
- cheerio
- express
- express-handlebars
- mongoose
- morgan
- request
The database used by the app is MongoDB. The database name is mongoHeadLines. It stores two collections, Headlines and Notes, which are defined in two Model files. To relate notes that may be entered for a particular headline, the Notes model includes a reference id to Headline model using the _headlineId data record.
Web data is requested and returned using the Axios fetch method. Specific data elements are accessed using Cheerio and stored in a MongoDB database.
The Headline collection in three records:
- headline
- link
- reporterDate
The reporterDate field is created by slicing the
- tag.
var headline = $(this).find("a").attr("title");
var link = $(this).find("a").attr("href");
var rd = $(this).find('ul').children('li').text();
var len = rd.length;
var date = rd.slice(-19);
var reporter = rd.slice(0, (len - 19));
The date and reporter slices are concatenated into the reportDate data record:
if (headline && link) {
articles.push({
headline: headline,
link: link,
reporterDate: reporter + " " + date
});
}
The page rendering engine is Express Handlebars. The main.handlebars {{{body}}} content is served by home and saved handlebar view files.
Alan Leverenz (awleverenz@gmail.com)







