-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathindex.qmd
212 lines (149 loc) · 8.72 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "imfp"
---
# imfp
[](https://github.com/Promptly-Technologies-LLC/imfp/actions/workflows/test.yml)
[](https://pypi.python.org/pypi/imfp)
[](https://github.com/psf/black)
`imfp`, created and maintained by [Promptly Technologies](https://promptlytechnologies.com), is a Python package for downloading data from the [International Monetary Fund's](http://data.imf.org/) [RESTful JSON API](http://datahelp.imf.org/knowledgebase/articles/667681-using-json-restful-web-service).
## Installation
To install the stable version of imfp from PyPi, use pip.
```bash
pip install --upgrade imfp
```
To load the library, use `import`:
``` {python}
import imfp
```
## Workflow
The `imfp` package introduces four core functions: `imf_databases`, `imf_parameters`, `imf_parameter_defs`, and `imf_dataset`. The function for downloading datasets is `imf_dataset`, but you will need the other functions to determine what arguments to supply to your `imf_dataset` function call.
### Fetching a List of Databases with `imf_databases`
For instance, all calls to `imf_dataset` require a `database_id`. This is because the IMF serves many different databases through its API, and the API needs to know which of these many databases you're requesting data from.
To fetch a list of available databases, use:
``` {python}
# Fetch list of available databases
databases = imfp.imf_databases()
```
See [Working with Databases](docs/databases.qmd) for more information.
### Fetching a List of Parameters and Input Codes with `imf_parameters`
Requests to fetch data from IMF databases are complicated by the fact that each database uses a different set of parameters when making a request. (At last count, there were 43 unique parameters used in making API requests from the various databases!) You also have to have the list of valid input codes for each parameter. See [Working with Parameters](docs/parameters.qmd) for a more detailed explanation of parameters and input codes and how they work.
To obtain the full list of parameters and valid input codes for a given database, use:
``` {python}
# Fetch list of valid parameters and input codes for commodity price database
params = imfp.imf_parameters("PCPS")
```
The `imf_parameters` function returns a dictionary of data frames. Each dictionary key name corresponds to a parameter used in making requests from the database:
``` {python}
# Get key names from the params object
params.keys()
```
Each named list item is a data frame containing the valid input codes (and their descriptions) that can be used with the named parameter.
To access the data frame containing valid values for each parameter, subset the `params` dict by the parameter name:
``` {python}
# View the data frame of valid input codes for the frequency parameter
params['freq']
```
### Supplying Parameter Arguments to `imf_dataset`
To make a request to fetch data from the IMF API, just call `imfp.imf_dataset` with the database ID and keyword arguments for each parameter, where the keyword argument name is the parameter name and the value is the list of codes you want.
For instance, on exploring the `freq` parameter of the Primary Commodity Price System database above, we found that the frequency can take one of three values: "A" for annual, "Q" for quarterly, and "M" for monthly. Thus, to request annual data, we can call `imfp.imf_dataset` with `freq = ["A"]`.
Similarly, we might search the dataframes of valid input codes for the `commodity` and `unit_measure` parameters to find the input codes for coal and index:
``` {python}
# Find the 'commodity' input code for coal
params['commodity'].loc[
params['commodity']['description'].str.contains("Coal")
]
```
``` {python}
# Find the 'unit_measure' input code for index
params['unit_measure'].loc[
params['unit_measure']['description'].str.contains("Index")
]
```
Finally, we can use the information we've gathered to make the request to fetch the data:
``` {python}
# Request data from the API
df = imfp.imf_dataset(database_id = "PCPS",
freq = ["A"], commodity = ["PCOAL", "PCOALAU", "PCOALSA"],
unit_measure = ["IX"],
start_year = 2000, end_year = 2015)
# Display the first few entries in the retrieved data frame
df.head()
```
The returned data frame has a `time_format` column that contains ISO 8601 duration codes. In this case, “P1Y” means “periods of 1 year.” The `unit_mult` column represents the power of 10 to which the value column should be raised. For instance, if value is in millions, then the unit multiplier will be 6 (meaning 10^6). If in billions, then the unit multiplier will be 9 (meaning 10^9). For more information on interpreting the returned data frame, see [Understanding the Data Frame](docs/usage.qmd#understanding-the-data-frame).
## Working with the Returned Data Frame
Note that all columns in the returned data frame are string objects, and to plot the series we will need to convert to valid numeric or date formats:
``` {python}
# Convert obs_value to numeric and time_period to integer year
df = df.astype({"time_period" : int, "obs_value" : float})
```
Then, using `seaborn` with `hue`, we can plot different indicators in different colors:
``` {python}
import seaborn as sns
# Plot prices of different commodities in different colors with seaborn
sns.lineplot(data=df, x='time_period', y='obs_value', hue='commodity');
```
## Contributing
We welcome contributions to improve `imfp`! Here's how you can help:
1. If you find a bug, please open [a Github issue](https://github.com/Promptly-Technologies-LLC/imfp/issues)
2. To fix a bug:
- Fork and clone the repository and open a terminal in the repository directory
- Install [uv](https://astral.sh/setup-uv/) with `curl -LsSf https://astral.sh/uv/install.sh | sh`
- Install the dependencies with `uv sync`
- Install a git hook to enforce conventional commits with `curl -o- https://raw.githubusercontent.com/tapsellorg/conventional-commits-git-hook/master/scripts/install.sh | sh`
- Create a fix, commit it with an ["Angular-style Conventional Commit"](https://www.conventionalcommits.org/en/v1.0.0-beta.4/) message, and push it to your fork
- Open a pull request to our `main` branch
Note that if you want to change and preview the documentation, you will need to install the [Quarto CLI tool](https://quarto.org/docs/download/).
Version incrementing, package building, testing, changelog generation, documentation rendering, publishing to PyPI, and Github release creation is handled automatically by the GitHub Actions workflow based on the commit messages.
## Working with LLMs
In line with the [llms.txt standard](https://llmstxt.org/), we have exposed the full Markdown-formatted project documentation as a [single text file](docs/static/llms.txt) to make it more usable by LLM agents.
``` {python}
#| echo: false
#| include: false
import re
from pathlib import Path
def extract_file_paths(quarto_yml_path):
"""
Extract href paths from _quarto.yml file.
Returns a list of .qmd file paths.
"""
with open(quarto_yml_path, 'r', encoding='utf-8') as f:
content = f.read()
# Find all href entries that point to .qmd files
pattern = r'href:\s*(.*?\.qmd)'
matches = re.findall(pattern, content, re.MULTILINE)
return matches
def process_qmd_content(file_path):
"""
Process a .qmd file by converting YAML frontmatter to markdown heading.
Returns the processed content as a string.
"""
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Replace YAML frontmatter with markdown heading
pattern = r'^---\s*\ntitle:\s*"([^"]+)"\s*\n---'
processed_content = re.sub(pattern, r'# \1', content)
return processed_content
# Get the current working directory
base_dir = Path.cwd()
quarto_yml_path = base_dir / '_quarto.yml'
print(quarto_yml_path)
# Extract file paths from _quarto.yml
qmd_files = extract_file_paths(quarto_yml_path)
print(qmd_files)
# Process each .qmd file and collect contents
processed_contents = []
for qmd_file in qmd_files:
file_path = base_dir / qmd_file
if file_path.exists():
processed_content = process_qmd_content(file_path)
processed_contents.append(processed_content)
# Concatenate all contents with double newline separator
final_content = '\n\n'.join(processed_contents)
# Ensure the output directory exists
output_dir = base_dir / 'docs' / 'static'
output_dir.mkdir(parents=True, exist_ok=True)
# Write the concatenated content to the output file
output_path = output_dir / 'llms.txt'
with open(output_path, 'w', encoding='utf-8') as f:
f.write(final_content)
```