Skip to content

Releases: spencermountain/dumpster-dive

5.0.0

31 Dec 20:50
Compare
Choose a tag to compare

v5

  • more consistent template json, via wtf_wikipedia@7
  • removal of empty [] results in Section.
  • fs fixes for node > 9

4.0.2

29 Oct 21:00
Compare
Choose a tag to compare

v3.2.0

3.3.0

  • bugfix for runtime parsing error

3.4.2

  • update deps, wtf library improvements
  • relicense as MIT
  • use latest mongo api

3.6.0

  • ⚠️ remove .infoboxes and .citations from top-level result. this is duplicate data. find them both in section[i].templates
  • improve handling of redirect pages
  • refactor encoding logic

v4

  • major json format changes from wtf_wikipedia v6.0.0
  • get skip_redirects actually working
  • reduce default batch_size even lower
  • add verbose_skip option, to log disambig/redirect skipping

3.1.0

23 May 16:54
cca9de7
Compare
Choose a tag to compare

some successes with getting to the end of en-wiki!

~11hrs

  • fix connection time-outs & improve logging output
  • change default collection name to pages
  • add .custom() function support

3.0.0

28 Apr 03:32
8ac1733
Compare
Choose a tag to compare
  • MASSIVE SPEEDUP! full re-write by @devrim 🙏 to fix #59 issue
  • rename from wikipedia-to-mongo to dumpster-dive
  • use wtf_wikipedia v3 (a big re-factor too!)
  • use line-by-line, and worker-nodes to run parsing in parallel

2.0

20 Sep 14:06
Compare
Choose a tag to compare
2.0
  • updates to use [email protected] - a major result-format change
  • renames bin cmd to wiki2mongo
  • supports use from cli, or use via javascript require()
  • support --plaintext flag