Minner is an easy way to make any web scraper for data-mining. Builded in C++14, with only one shared library, libcurl. With log messages through slack and terminal.
In original version (some parts are still), this scraper is only a service for NF-eBOT, but now, my objective is to refactor this project to make more people use this.
Make fork and refactor for your situation.
- gcc >= 3.5.1
- libcurl - install via OS package manager (ex: apt install libcurl)
- Create
doc/config.h
withdoc/config.h.dist
template.
cmake . && make
./minner --SCRAPER_KEY
(best choice for dev and good choice for production)
- Install Docker
- Create
doc/config.h
withdoc/config.h.dist
template.
docker build -t nfebot/minner .
docker run -ti --rm nfebot/minner --SCRAPER_KEY
(best choice for Windows and dev)
- Install Vagrant
vagrant up && vagrant ssh
- Create
doc/config.h
withdoc/config.h.dist
template.
cd /data && cmake . && make
./minner --SCRAPER_KEY
--nfe-notas-tecnicas
nfe.fazenda.gov.br / Notas Técnicas
--nfe-avisos
nfe.fazenda.gov.br / Avisos
--sped
sped.rfb.gov.br / Destaques
app
: application source filesapp/include
: application lib/modules source fileapp/include/parsers
: web page parse layerapp/include/services
: external web services
build
: where builded executable is saved (with you use ./scripts/gcc_build.sh)doc
: configuration filelib
: vendor libsscripts
: scripts to help build and installspike
: files to test technologies or ideas
- Make
doc/config.h
more simple - Change all #include to use .h files
- Make const parameters in
include/helpers.h
- Refactor this code block in
app/main.cpp
:
rapidxml::xml_document<> doc;
char *cstr = new char[res.size() + 1];
strcpy(cstr, res.c_str());
doc.parse<0>(cstr);
And a lot of more refactors...
@mattgodbolt @dascandy @famastefano @grisumbras @Corristo
and other guys in C++ Slack Group