[EN] Collect the images of the Armenian monuments #15

ansakoy · 2023-06-14T18:59:15Z

Goal

Collect the images of the Armenian monuments accompanied by metadata.

Tasks

The task has two components. First, it is necessary to collect all the metadata of all the images at all the pages and them in a machine readable format. By the metadata, we mean:

the attributes that form the description of each individual image;
the URL of each image;
the relative path to the file in a folder.

The tricky part with the metadata is the lack of a predictable structure of a description. For instance, some may contain a year, a location, etc, while others only have name of a person. One possible approach to solving this may be as follows:

Analyze a number of image descriptions;
Make a parser to grab the values you managed to single out;
Store these values in separate fields in your output data;
Store the whole unparsed description text as HTML in a separate field (just in case your parser misses something important).

Or you can simply grab the description (as HTML) as a single value and the image url as another, without any extra parsing.

Second, download all the pictures and store them in a folder. This folder should be temporarily published as a zipped archive at your server or at a sharing platform, such as Google Drive. After this is done, please, let us know about it and we shall copy these files to our own storage, so that it does not occupy your disk space.

Preferably, both components should be completed. Alternatively, if you have a perfect idea of how to collect the metadata, but have no room to store the result, the first component would be sufficient.

Context

The website presents an impressive collection of historical images of various kinds, from pictures of monuments to old photos of people. Problem is they are all presented as a web-gallery only. There is no option to download these images and their metadata in bulk. In other words, if this website disappears, its huge collection may be lost for the public for ever. It would be nice to have a backup for such a project, as well as to provide a convenient way of using these data automatically.

Requirements

A public GitHub repository should be created to store and publish the code and the data under one of the free and open licenses, such as Creative Commons or MIT.

Wishes

It would be best if your code is reusable, that is can be launch again by anyone who might want to update the dataset at a later point. For the same reason, we encourage you to comment your code, supplement it with at least a very brief README description, and specify the requirements and dependencies necessary to use the code.

Resources

http://www.armenianmonumentsimages.com/

Parsing may require a library that imitates human interaction with the website.

Prepared by

The Open Data Armenia team prepared this task.

ansakoy added parsing Tasks that require data parsing extraction Task that require data extraction (scraping) skills topic-culture Tasks dedicatated Armenian culture, language and history labels Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EN] Collect the images of the Armenian monuments #15

[EN] Collect the images of the Armenian monuments #15

ansakoy commented Jun 14, 2023 •

edited

Loading

[EN] Collect the images of the Armenian monuments #15

[EN] Collect the images of the Armenian monuments #15

Comments

ansakoy commented Jun 14, 2023 • edited Loading

Goal

Tasks

Context

Requirements

Wishes

Resources

Prepared by

ansakoy commented Jun 14, 2023 •

edited

Loading