[EN] Collect metadata on the reports of the Armenian NGOs #8
Labels
extraction
Task that require data extraction (scraping) skills
topic-finances
Task related to public finances, banking, currencies and e.t.c
topic-government
Tasks dedicated to government openness
Goal
The goal is to create a dataset containing the metadata on the reports of the Armenian NGOs.
Tasks
The metadata are published at an ASPX website (https://www.petekamutner.am/Reports_vh.aspx?rptid=1), which may require some browser emulating libraries. Otherwise, the task is to collect all the data from the given paginated table and store them in a machine readable format, such as JSON, or XML, or CSV in a flat structure. For example:
Please, bear in mind that all the digital IDs such as the taxpayer ID (ՀՎՀՀ) should always be stored as strings (characters) to preserve their precise value, including leading zeros, if there are any.
Context
The State Revenue Committee of the Republic of Armenia publishes the reports of NGOs, which may contain invaluable information on how the non-government and non-profit sector operates. Problem is, the reports themselves are published as PDF files that are hard to automatically process. However, these files are accompanied by rather helpful metadata, including organizations' names and IDs, and reports' years that allow to quickly search for specific reports, as well as to check out whether an NGO has such reports at all.
Requirements
A public GitHub repository should be created to store and publish the code and the data under one of the free and open licenses, such as Creative Commons or MIT.
Wishes
It would be best if your code is reusable, that is can be launch again by anyone who might want to update the dataset at a later point. For the same reason, we encourage you to comment your code, supplement it with at least a very brief README description, and specify the requirements and dependencies necessary to use the code.
Resources
https://www.petekamutner.am/Reports_vh.aspx?rptid=1
Prepared by
The Open Data Armenia team prepared this task
The text was updated successfully, but these errors were encountered: