-
Notifications
You must be signed in to change notification settings - Fork 14
implement task [STT-1218] Feeding service for TT API #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement task [STT-1218] Feeding service for TT API #353
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a feeding service and parser for the TT Content API as part of task STT-1218. It enables fetching content from TT's Content API and transforming it into a Superdesk-compatible format.
- Adds STTContentAPIService for API communication with configurable URL and API key authentication
- Implements ContentAPITTItemParser to transform TT API responses into Superdesk format
- Includes comprehensive unit tests for both components with fixture data validation
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| server/stt/io/feeding_services/stt_tt_content_api.py | Main feeding service implementing HTTP-based API communication with TT Content API |
| server/stt/io/feed_parsers/stt_tt_parse_content_api.py | Parser that transforms TT API responses into Superdesk-compatible item format |
| server/tests/stt_tt_content_api_test.py | Comprehensive unit tests for the feeding service with mocked API responses |
| server/tests/stt_tt_parse_content_api_test.py | Unit tests for the parser with fixture data validation |
| server/tests/fixtures/api/stt_tt_content_api.json | Test fixture containing sample TT API response data |
| server/stt/io/feeding_services/init.py | Logging configuration for feeding services module |
| server/stt/io/feed_parsers/init.py | Logging configuration for feed parsers module |
| server/settings.py | Module registration for the new feeding service and parser |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # logger.warning("Data: %s", data.get("hits")) | ||
| logger.warning("type: %s", type(data)) |
Copilot
AI
Aug 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commented out and debug logging statements should be removed before production deployment. The warning log for type information is not needed in production code.
| # logger.warning("Data: %s", data.get("hits")) | |
| logger.warning("type: %s", type(data)) | |
|
it's referring to content api in a few places (in code and PR name) but it's not related, would be good to rename it to just tt api to avoid confusion. also there is some code duplicated from #352, would be good to merge that one first and then reuse it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Because the API returns two different data structures, we need two separate services to parse them accordingly. |
| elif vc.tzinfo is None: | ||
| processed["versioncreated"] = vc.replace(tzinfo=timezone.utc) | ||
|
|
||
| # 4) Expiry based on provider config (hours) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's handled automatically, pls remove it from parser
| h = hashlib.sha1(blob.encode("utf-8")).hexdigest() | ||
| return f"urn:newsml:stt.fi:stt_tt_content_api:{h}" | ||
| except Exception: | ||
| return f"urn:newsml:stt.fi:stt_tt_content_api:{uuid.uuid4()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that's a good idea in general, that would create a new item for each version
| ) | ||
| if isinstance(uri, (str, int)): | ||
| s = str(uri) | ||
| return f"urn:newsml:stt.fi:stt_tt_content_api:{hashlib.sha1(s.encode('utf-8')).hexdigest()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why using hashlib and not using eg. 250815-usaryssmote6uv-06eb0f86 as suffix or converthing the whole url into uri?
9b9e6a7 to
04f089e
Compare
|
@petrjasek pls help me check issue |
04f089e to
d624d4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove all those extra and duplicated tests, seems like lots of code is duplicated in those two test files, not sure if we need parser specific one if it checks the parsing in the other file. there is no value in testing something 2x, just more work to maintain it
| def _ensure_guid(self, item: Dict[str, Any]) -> str: | ||
| """Generate GUIDs with a TT-specific namespace while respecting existing URNs.""" | ||
| base_guid = super()._ensure_guid(item) | ||
| tt_prefix = "urn:newsml:stt.fi:stt_tt_content_api:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo the prefix should be like urn:tt.fi: it's not really STT content is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated
| page_url = urlunparse(parsed._replace(query=urlencode(qs, doseq=True))) | ||
|
|
||
| # Use base class HTTP retry infrastructure | ||
| response = self._get_with_retry(page_url, headers=headers, timeout=300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout 300 is imo a bit too much, or is the api that slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change timeout to 60
| logger.info("TT API: total from first page = %s", total) | ||
|
|
||
| batch = self._extract_tt_items_from_response(data) | ||
| logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's rather for .debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated
|
@petrjasek pls help me review |
|
@petrjasek pls check again |
|
I updated this code , follow review, pls help me check |
97187c0 to
4cda417
Compare
| ), | ||
| }, | ||
| { | ||
| "id": "use_trs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know why is this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can enable or disable this configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why? I don't see it as a requirement and it complicates things
d6d3bcd to
4aad873
Compare
|
@petrjasek, please check this configuration library issue |
|
I've merged the async version in so the tests should be working now |
ea2c88b to
4d67f9d
Compare
4d67f9d to
68a0ba7
Compare
|
@petrjasek pls help me check again. |
|
I checked it, would still remove that |
ca8f67d to
b54accd
Compare
b54accd to
e06a417
Compare
|
@petrjasek I remove use_trs |
| still synchronous, so we offload it via asyncio.to_thread. | ||
| """ | ||
| # Offload sync fetch to a worker thread to avoid blocking the event loop | ||
| json_items = await asyncio.to_thread(self._fetch_tt_data, provider, update) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if that's useful here, would be probably better to just use https://docs.aiohttp.org/en/stable/ for fetching, the rest can run as usual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| qs = {**base_qs, "s": str(page_size), "fr": str(offset)} | ||
| if trs_value: | ||
| qs["trs"] = trs_value | ||
| page_url = urlunparse(parsed._replace(query=urlencode(qs, doseq=True))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why doing this urlencode urlunparse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrjasek Please help me check the CI/CD issue.
| ) | ||
| else: | ||
| # No more results | ||
| return [it for it in items if isinstance(it, dict)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if it needs another iteration if you only add dicts to items
| # No more results | ||
| return [it for it in items if isinstance(it, dict)] | ||
| break | ||
| except Exception as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should only rerun on that aiohttp.ClientResponseError right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| offset += page_size | ||
| if isinstance(total, int) and offset >= total: | ||
| break | ||
| return [it for it in items if isinstance(it, dict)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue like on line 330
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated

Implement Task Feature: STT-1218 - Feeding Service for TT Content API
Feeding Service: STTContentAPIService – Handles API communication and data fetching
Parser: ContentAPITTItemParser – Transforms TT API responses into a Superdesk-compatible format
Add unit tests for the following files: