Skip to content

Commit f7a40c6

Browse files
THOR300Mark
and
Mark
authored
Feature/pdct 765 process failed documents during azure run (#73)
This Pull Request: --- - Updates the action that is performed for the `reparse` update type. - Now we no longer download from source but rely on the document that is in the cdn. Annoyingly lot's of documents changed just due to formatting and time stamps. Let me know if you want me to reduce the changes by checking out from main --------- Co-authored-by: Mark <[email protected]>
1 parent 528c2b5 commit f7a40c6

File tree

42 files changed

+214
-59
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+214
-59
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"document_name": "name",
3+
"document_description": "description",
4+
"document_id": "TESTCCLW.executive.6.6",
5+
"document_source_url": "http://existing.com/",
6+
"document_cdn_object": null,
7+
"document_content_type": "text/html",
8+
"document_md5_sum": null,
9+
"document_metadata": {},
10+
"document_slug": "fake_slug",
11+
"languages": [
12+
"en"
13+
],
14+
"translated": false,
15+
"html_data": null,
16+
"pdf_data": null
17+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"document_name": "name",
3+
"document_description": "description",
4+
"document_id": "TESTCCLW.executive.6.6",
5+
"document_source_url": "http://existing.com/",
6+
"document_cdn_object": null,
7+
"document_content_type": "text/html",
8+
"document_md5_sum": null,
9+
"document_metadata": {},
10+
"document_slug": "fake_slug",
11+
"languages": [
12+
"en"
13+
],
14+
"translated": false,
15+
"html_data": null,
16+
"pdf_data": null
17+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"document_name": "name",
3+
"document_description": "description",
4+
"document_id": "TESTCCLW.executive.6.6",
5+
"document_source_url": "http://existing.com/",
6+
"document_cdn_object": null,
7+
"document_content_type": "text/html",
8+
"document_md5_sum": null,
9+
"document_metadata": {},
10+
"document_slug": "fake_slug"
11+
}

integration_tests/data/pipeline_in/input/2022-11-01T21.53.26.945831/new_and_updated_documents.json

+7
Original file line numberDiff line numberDiff line change
@@ -1022,6 +1022,13 @@
10221022
},
10231023
"type": "metadata"
10241024
}
1025+
],
1026+
"TESTCCLW.executive.6.6": [
1027+
{
1028+
"s3_value": null,
1029+
"db_value": null,
1030+
"type": "reparse"
1031+
}
10251032
]
10261033
}
10271034
}
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@
2929
]
3030
},
3131
"pdf_data": null
32-
}
32+
}
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@
2929
]
3030
},
3131
"pdf_data": null
32-
}
32+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"document_name": "name",
3+
"document_description": "description",
4+
"document_id": "TESTCCLW.executive.6.6",
5+
"document_source_url": "http://existing.com/",
6+
"document_cdn_object": null,
7+
"document_content_type": "text/html",
8+
"document_md5_sum": null,
9+
"document_metadata": {},
10+
"document_slug": "fake_slug",
11+
"languages": [
12+
"en"
13+
],
14+
"translated": false,
15+
"html_data": null,
16+
"pdf_data": null
17+
}
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@
2929
]
3030
},
3131
"pdf_data": null
32-
}
32+
}
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@
2929
]
3030
},
3131
"pdf_data": null
32-
}
32+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"document_name": "name",
3+
"document_description": "description",
4+
"document_id": "TESTCCLW.executive.6.6",
5+
"document_source_url": "http://existing.com/",
6+
"document_cdn_object": null,
7+
"document_content_type": "text/html",
8+
"document_md5_sum": null,
9+
"document_metadata": {},
10+
"document_slug": "fake_slug",
11+
"languages": [
12+
"en"
13+
],
14+
"translated": false,
15+
"html_data": null,
16+
"pdf_data": null
17+
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1547.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Presidential of the Republic of Indonesia Instruction Number 6 Year 2013 on Suspension of New Licenses and Improving Forest Governance of Primary Forest and Peatland",
1212
"description": "The first iteration of this instruction was issued in 2011 in order to implement commitments under the agreements in the Letter of Intent signed with the Kingdom of Norway in May 2011. The Instruction is intended to facilitate Indonesia's participation in internationally financed REDD activities and places a moratorium on clearance of primary peatland and forests within moratorium areas.\u00a0The initial moratorium was extended by Presidential Instruction 6/2013.\u00a0In 2019, President Joko Widodo signed Presidential Instruction 5/2019, making the moratorium on the clearance of primary forest and peatlands in moratorium areas permanent.\u00a0",
1313
"import_id": "TESTCCLW.executive.1332.1547",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "indonesia_2013_presidential-instruction-6-year-2013-on-suspension-of-new-licenses-and-improving-forest-governance-of-primary-forest-and-peatland",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2013-01-01T00:00:00",
1718
"date": "01/01/2013",
1819
"source_url": "https://www.africau.edu/images/default/sample.pdf",
@@ -43,8 +44,7 @@
4344
"Adaptation",
4445
"Mitigation"
4546
]
46-
},
47-
"family_slug": "slug_TESTCCLW.family.1332.0"
47+
}
4848
},
4949
"pipeline_metadata": {}
5050
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1548.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Presidential of the Republic of Indonesia Instruction Number 6 Year 2013 on Suspension of New Licenses and Improving Forest Governance of Primary Forest and Peatland",
1212
"description": "The first iteration of this instruction was issued in 2011 in order to implement commitments under the agreements in the Letter of Intent signed with the Kingdom of Norway in May 2011. The Instruction is intended to facilitate Indonesia's participation in internationally financed REDD activities and places a moratorium on clearance of primary peatland and forests within moratorium areas.\u00a0The initial moratorium was extended by Presidential Instruction 6/2013.\u00a0In 2019, President Joko Widodo signed Presidential Instruction 5/2019, making the moratorium on the clearance of primary forest and peatlands in moratorium areas permanent.\u00a0",
1313
"import_id": "TESTCCLW.executive.1332.1548",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "england_2013_presidential-of-the-republic-of-indonesia-instruction-number-6-year-2013-on-suspension-of-new-licenses-and-improving-forest-governance-of-primary-forest-and-peatland",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2013-01-01T00:00:00",
1718
"date": "01/01/2013",
1819
"source_url": "https://www.africau.edu/images/default/sample.pdf",
@@ -43,8 +44,7 @@
4344
"Adaptation",
4445
"Mitigation"
4546
]
46-
},
47-
"family_slug": "slug_TESTCCLW.family.1332.0"
47+
}
4848
},
4949
"pipeline_metadata": {}
5050
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1549.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Act on Promotion of Global Warming Countermeasures",
1212
"description": "This Law is one of the two key climate laws in Japan along with the Energy Conservation Law. The purpose of the Law is to reduce emissions of GHGs derived from anthropogenic activities. GHGs are carbon dioxide, methane, nitrous oxide, HFC, PFC and sulphur hexafluoride. The Council of Ministers for Global Environmental Conservation is established under the Law. The Council is chaired by the Prime Minister, and vice-chairmen are the Chief Cabinet Secretary, Minister of the Environment and Minister of Economy, Trade and Industry. Other members consist of all ministers other than vice-chairmen.\u00a0\u00a0Designated emitters, whose workplaces contain more than 1,500kL of oil equivalent of energy annually, are mandated to develop the Plan for Global Warming Countermeasure. While there is no reduction obligation under this law, annual emission of GHGs are reported to the Minister in charge. Emission reporting under this framework equals that of the reporting under the Energy Conservation Law.\u00a0\u00a0This Law stipulates that the State is responsible for implementing necessary measures to introduce Emission Trading Scheme (ETS) in Japan. It adds that examination and discussion of the design and the utilization of ETS starts upon the enactment of this Law.\u00a0\u00a0This Law also provides that the national and local governments are responsible for development and implementation of plans to reduce GHG emissions.\u00a0\u00a0The National Government adopted the Plan for Global Warming Countermeasures in May 2016, with the explicit aim to achieve the mid-term target set in Japan's INDC (2015) 26% GHG emissions reduction by 2030 (baseline 2013). In addition, the Plan also sets a long term goal of 80% GHG emissions reduction by 2050.\u00a0 Prefectural and municipal governments are also required to create local plans to reduce GHG emissions. The plans should include:\u00a0Duration of the planGoalsMeasures and actions intended for implementationPromotion of solar PV, wind and other renewable energiesMeasures and actions taken by business professionals and citizens to reduce GHG emissionPromotion of public transport use, conservation of green space and other GHG emission reduction measuresOn June 4th, 2021, the Diet approved the amending Act 54/2021 introducing a net zero target by 2050 into the law.",
1313
"import_id": "TESTCCLW.executive.1332.1549",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "japan_1998_act-on-promotion-of-global-warming-countermeasures",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "1998-01-01T00:00:00",
1718
"date": "01/01/1998",
1819
"source_url": "https://www.africau.edu/images/default/sample.pdf",
@@ -50,8 +51,7 @@
5051
"topics": [
5152
"Mitigation"
5253
]
53-
},
54-
"family_slug": "slug_TESTCCLW.family.1332.0"
54+
}
5555
},
5656
"pipeline_metadata": {}
5757
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1550.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Cabinet decisions for a law proposal to partially amend the law on the promotion of global warming countermeasures",
1212
"description": "This Law is one of the two key climate laws in Japan along with the Energy Conservation Law. The purpose of the Law is to reduce emissions of GHGs derived from anthropogenic activities. GHGs are carbon dioxide, methane, nitrous oxide, HFC, PFC and sulphur hexafluoride. The Council of Ministers for Global Environmental Conservation is established under the Law. The Council is chaired by the Prime Minister, and vice-chairmen are the Chief Cabinet Secretary, Minister of the Environment and Minister of Economy, Trade and Industry. Other members consist of all ministers other than vice-chairmen.\u00a0\u00a0Designated emitters, whose workplaces contain more than 1,500kL of oil equivalent of energy annually, are mandated to develop the Plan for Global Warming Countermeasure. While there is no reduction obligation under this law, annual emission of GHGs are reported to the Minister in charge. Emission reporting under this framework equals that of the reporting under the Energy Conservation Law.\u00a0\u00a0This Law stipulates that the State is responsible for implementing necessary measures to introduce Emission Trading Scheme (ETS) in Japan. It adds that examination and discussion of the design and the utilization of ETS starts upon the enactment of this Law.\u00a0\u00a0This Law also provides that the national and local governments are responsible for development and implementation of plans to reduce GHG emissions.\u00a0\u00a0The National Government adopted the Plan for Global Warming Countermeasures in May 2016, with the explicit aim to achieve the mid-term target set in Japan's INDC (2015) 26% GHG emissions reduction by 2030 (baseline 2013). In addition, the Plan also sets a long term goal of 80% GHG emissions reduction by 2050.\u00a0 Prefectural and municipal governments are also required to create local plans to reduce GHG emissions. The plans should include:\u00a0Duration of the planGoalsMeasures and actions intended for implementationPromotion of solar PV, wind and other renewable energiesMeasures and actions taken by business professionals and citizens to reduce GHG emissionPromotion of public transport use, conservation of green space and other GHG emission reduction measuresOn June 4th, 2021, the Diet approved the amending Act 54/2021 introducing a net zero target by 2050 into the law.",
1313
"import_id": "TESTCCLW.executive.1332.1550",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "japan_2020_cabinet-decisions-for-a-law-proposal-to-partially-amend-the-law-on-the-promotion-of-global-warming-countermeasures",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2020-01-01T00:00:00",
1718
"date": "01/01/2020",
1819
"source_url": "https://www.env.go.jp/press/109218.html",
@@ -50,8 +51,7 @@
5051
"topics": [
5152
"Mitigation"
5253
]
53-
},
54-
"family_slug": "slug_TESTCCLW.family.1332.0"
54+
}
5555
},
5656
"pipeline_metadata": {}
5757
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1551.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Revised Global Warming Countermeasures Promotion Law",
1212
"description": "This Law is one of the two key climate laws in Japan along with the Energy Conservation Law. The purpose of the Law is to reduce emissions of GHGs derived from anthropogenic activities. GHGs are carbon dioxide, methane, nitrous oxide, HFC, PFC and sulphur hexafluoride. The Council of Ministers for Global Environmental Conservation is established under the Law. The Council is chaired by the Prime Minister, and vice-chairmen are the Chief Cabinet Secretary, Minister of the Environment and Minister of Economy, Trade and Industry. Other members consist of all ministers other than vice-chairmen.\u00a0\u00a0Designated emitters, whose workplaces contain more than 1,500kL of oil equivalent of energy annually, are mandated to develop the Plan for Global Warming Countermeasure. While there is no reduction obligation under this law, annual emission of GHGs are reported to the Minister in charge. Emission reporting under this framework equals that of the reporting under the Energy Conservation Law.\u00a0\u00a0This Law stipulates that the State is responsible for implementing necessary measures to introduce Emission Trading Scheme (ETS) in Japan. It adds that examination and discussion of the design and the utilization of ETS starts upon the enactment of this Law.\u00a0\u00a0This Law also provides that the national and local governments are responsible for development and implementation of plans to reduce GHG emissions.\u00a0\u00a0The National Government adopted the Plan for Global Warming Countermeasures in May 2016, with the explicit aim to achieve the mid-term target set in Japan's INDC (2015) 26% GHG emissions reduction by 2030 (baseline 2013). In addition, the Plan also sets a long term goal of 80% GHG emissions reduction by 2050.\u00a0 Prefectural and municipal governments are also required to create local plans to reduce GHG emissions. The plans should include:\u00a0Duration of the planGoalsMeasures and actions intended for implementationPromotion of solar PV, wind and other renewable energiesMeasures and actions taken by business professionals and citizens to reduce GHG emissionPromotion of public transport use, conservation of green space and other GHG emission reduction measuresOn June 4th, 2021, the Diet approved the amending Act 54/2021 introducing a net zero target by 2050 into the law.",
1313
"import_id": "TESTCCLW.executive.1332.1551",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "japan_2021_revised-global-warming-countermeasures-promotion-law",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2021-01-01T00:00:00",
1718
"date": "01/01/2021",
1819
"source_url": "https://www.env.go.jp/press/ontaihou/116348.pdf",
@@ -50,8 +51,7 @@
5051
"topics": [
5152
"Mitigation"
5253
]
53-
},
54-
"family_slug": "slug_TESTCCLW.family.1332.0"
54+
}
5555
},
5656
"pipeline_metadata": {}
5757
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1552.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Basic Hydrogen Strategy",
1212
"description": "This document sets Japan's vision on how to achieve a hydrogen-based society by 2050 and provides an action plan for its realisation. It specifically seeks to decarbonise the energy, industry and transportation sectors.The Strategic Roadmap for Hydrogen and Fuel Cells defines 1) new targets on the specification of basic technologies and the breakdown of costs, 2) necessary measures for achieving these goals; and 3) that Japan will convene a working group consisting of experts to review the status of implementation in each area stipulated by the roadmap.",
1313
"import_id": "TESTCCLW.executive.1332.1552",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "japan_2017_basic-hydrogen-strategy",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2017-01-01T00:00:00",
1718
"date": "01/01/2017",
1819
"source_url": "https://www.meti.go.jp/english/press/2017/pdf/1226_003b.pdf",
@@ -41,8 +42,7 @@
4142
"topics": [
4243
"Mitigation"
4344
]
44-
},
45-
"family_slug": "slug_TESTCCLW.family.1332.0"
45+
}
4646
},
4747
"pipeline_metadata": {}
4848
}

integration_tests/data/pipeline_out/ingest_unit_test_parser_input/TESTCCLW.executive.1332.1553.json

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@
1111
"name": "Spanish Climate Change And Clean Energy Strategy Horizon 2007- 2012 -2020",
1212
"description": "The Spanish Climate Change and Clean Energy Strategy (EECCEL) horizon 2007-2012-2020 is part of the Spanish Sustainable Development Strategy (EEDS). The EECCEL includes different measures that contribute to sustainable development within the scope of climate change and clean energy.\u00a0\u00a0This Strategy is based on the reference framework of the 'Spanish Strategy for the fulfilment of the objectives under the Kyoto Protocol', and it takes into account the measures and Programmes adopted by the Autonomous Communities.\u00a0\u00a0The strategy has two chapters. The first one defines actions to fight against climate change and the second one, actions to achieve cleaner energy. Each chapter includes a description of the present situation, the objectives to be reached, the suggested measures and a selection of indicators for the corresponding follow-up.\u00a0\u00a0The operational objectives are:\u00a0- To ensure the reduction of GHG emissions in Spain, giving special importance to measures related to the energy sector. According to the national inventory, in 2005, emissions from energy process represented about 78.87% of total national emissions.\u00a0- To contribute to sustainable development and the fulfilment of climate change commitments by strengthening the use of flexible project-based mechanisms.\u00a0- To promote additional reduction measures in sectors concerned with diffuse pollution.\u00a0- To apply the National Climate Change Adaptation Plan (NCCAP) so as to integrate adaptation measures and strategies in sectoral policies.\u00a0- To increase public awareness with respect to clean energy and climate change.\u00a0- To promote research, development and innovation in matters of climate change and clean energy.\u00a0- To guarantee energy supply security by means of cleaner energies, mainly from renewable sources, achieving other environmental benefits (for example, air quality) and limiting the growth rate of external energy dependence.\u00a0- To boost energy- and resource efficiency for companies and for end users.\u00a0\u00a0The government has adopted a Plan of Urgent Measures (PMU), which together with the 2008-2012 Energy Saving and Efficiency Action Plan aims to consolidate the trend change of GHG emissions in Spain initiated in 2006.",
1313
"import_id": "TESTCCLW.executive.1332.1553",
14-
"family_import_id": "TESTCCLW.family.1332.0",
1514
"slug": "spain_2007_spanish-climate-change-and-clean-energy-strategy-horizon-2007-2012-2020",
15+
"family_import_id": "TESTCCLW.family.1332.0",
16+
"family_slug": "slug_TESTCCLW.family.1332.0",
1617
"publication_ts": "2007-01-01T00:00:00",
1718
"date": "01/01/2007",
1819
"source_url": "https://www.lse.ac.uk/GranthamInstitute/wp-content/uploads/laws/1674%20English.pdf",
@@ -49,8 +50,7 @@
4950
"Adaptation",
5051
"Mitigation"
5152
]
52-
},
53-
"family_slug": "slug_TESTCCLW.family.1332.0"
53+
}
5454
},
5555
"pipeline_metadata": {}
5656
}

0 commit comments

Comments
 (0)