Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN] Collect metadata on the reports of the Armenian NGOs #8

Open
ansakoy opened this issue Jun 5, 2023 · 0 comments
Open

[EN] Collect metadata on the reports of the Armenian NGOs #8

ansakoy opened this issue Jun 5, 2023 · 0 comments
Labels
extraction Task that require data extraction (scraping) skills topic-finances Task related to public finances, banking, currencies and e.t.c topic-government Tasks dedicated to government openness

Comments

@ansakoy
Copy link
Collaborator

ansakoy commented Jun 5, 2023

Goal

The goal is to create a dataset containing the metadata on the reports of the Armenian NGOs.

Tasks

The metadata are published at an ASPX website (https://www.petekamutner.am/Reports_vh.aspx?rptid=1), which may require some browser emulating libraries. Otherwise, the task is to collect all the data from the given paginated table and store them in a machine readable format, such as JSON, or XML, or CSV in a flat structure. For example:

{
	"reg_num": STRING,
	"year": NUMBER,
	"taxpayer_id": STRING,
	"org_name": STRING,
	"org_type": STRING,
	"report_date": STRING,
	"report_url": STRING,  // the url to download the given report
	"date_update": STRING,
}

Please, bear in mind that all the digital IDs such as the taxpayer ID (ՀՎՀՀ) should always be stored as strings (characters) to preserve their precise value, including leading zeros, if there are any.

Context

The State Revenue Committee of the Republic of Armenia publishes the reports of NGOs, which may contain invaluable information on how the non-government and non-profit sector operates. Problem is, the reports themselves are published as PDF files that are hard to automatically process. However, these files are accompanied by rather helpful metadata, including organizations' names and IDs, and reports' years that allow to quickly search for specific reports, as well as to check out whether an NGO has such reports at all.

Requirements

A public GitHub repository should be created to store and publish the code and the data under one of the free and open licenses, such as Creative Commons or MIT.

Wishes

It would be best if your code is reusable, that is can be launch again by anyone who might want to update the dataset at a later point. For the same reason, we encourage you to comment your code, supplement it with at least a very brief README description, and specify the requirements and dependencies necessary to use the code.

Resources

https://www.petekamutner.am/Reports_vh.aspx?rptid=1

Prepared by

The Open Data Armenia team prepared this task

@ivbeg ivbeg added extraction Task that require data extraction (scraping) skills topic-government Tasks dedicated to government openness topic-finances Task related to public finances, banking, currencies and e.t.c labels Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extraction Task that require data extraction (scraping) skills topic-finances Task related to public finances, banking, currencies and e.t.c topic-government Tasks dedicated to government openness
Projects
None yet
Development

No branches or pull requests

2 participants