Skip to content

Conversation

@minhpham212
Copy link
Contributor

@minhpham212 minhpham212 commented Aug 17, 2025

Implement Task Feature: STT-1218 - Feeding Service for TT Content API

Feeding Service: STTContentAPIService – Handles API communication and data fetching

Parser: ContentAPITTItemParser – Transforms TT API responses into a Superdesk-compatible format

Add unit tests for the following files:

1.	server/tests/stt_tt_content_api_test.py
2.	server/tests/stt_tt_parse_content_api_test.py

@minhpham212 minhpham212 changed the title implement task [STT-1217] Feeding service for Superdesk content API implement task [STT-1218] Feeding service for Superdesk content API Aug 21, 2025
@petrjasek petrjasek requested a review from Copilot August 25, 2025 14:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a feeding service and parser for the TT Content API as part of task STT-1218. It enables fetching content from TT's Content API and transforming it into a Superdesk-compatible format.

  • Adds STTContentAPIService for API communication with configurable URL and API key authentication
  • Implements ContentAPITTItemParser to transform TT API responses into Superdesk format
  • Includes comprehensive unit tests for both components with fixture data validation

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/stt/io/feeding_services/stt_tt_content_api.py Main feeding service implementing HTTP-based API communication with TT Content API
server/stt/io/feed_parsers/stt_tt_parse_content_api.py Parser that transforms TT API responses into Superdesk-compatible item format
server/tests/stt_tt_content_api_test.py Comprehensive unit tests for the feeding service with mocked API responses
server/tests/stt_tt_parse_content_api_test.py Unit tests for the parser with fixture data validation
server/tests/fixtures/api/stt_tt_content_api.json Test fixture containing sample TT API response data
server/stt/io/feeding_services/init.py Logging configuration for feeding services module
server/stt/io/feed_parsers/init.py Logging configuration for feed parsers module
server/settings.py Module registration for the new feeding service and parser

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 124 to 125
# logger.warning("Data: %s", data.get("hits"))
logger.warning("type: %s", type(data))
Copy link

Copilot AI Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented out and debug logging statements should be removed before production deployment. The warning log for type information is not needed in production code.

Suggested change
# logger.warning("Data: %s", data.get("hits"))
logger.warning("type: %s", type(data))

Copilot uses AI. Check for mistakes.
@petrjasek
Copy link
Member

it's referring to content api in a few places (in code and PR name) but it's not related, would be good to rename it to just tt api to avoid confusion. also there is some code duplicated from #352, would be good to merge that one first and then reuse it

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@petrjasek petrjasek changed the title implement task [STT-1218] Feeding service for Superdesk content API implement task [STT-1218] Feeding service for TT API Aug 28, 2025
@minhpham212
Copy link
Contributor Author

it's referring to content api in a few places (in code and PR name) but it's not related, would be good to rename it to just tt api to avoid confusion. also there is some code duplicated from #352, would be good to merge that one first and then reuse it

Because the API returns two different data structures, we need two separate services to parse them accordingly.

elif vc.tzinfo is None:
processed["versioncreated"] = vc.replace(tzinfo=timezone.utc)

# 4) Expiry based on provider config (hours)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's handled automatically, pls remove it from parser

h = hashlib.sha1(blob.encode("utf-8")).hexdigest()
return f"urn:newsml:stt.fi:stt_tt_content_api:{h}"
except Exception:
return f"urn:newsml:stt.fi:stt_tt_content_api:{uuid.uuid4()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that's a good idea in general, that would create a new item for each version

)
if isinstance(uri, (str, int)):
s = str(uri)
return f"urn:newsml:stt.fi:stt_tt_content_api:{hashlib.sha1(s.encode('utf-8')).hexdigest()}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why using hashlib and not using eg. 250815-usaryssmote6uv-06eb0f86 as suffix or converthing the whole url into uri?

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch from 9b9e6a7 to 04f089e Compare September 24, 2025 23:55
@minhpham212
Copy link
Contributor Author

@petrjasek pls help me check issue
image

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch from 04f089e to d624d4c Compare September 25, 2025 06:23
Copy link
Member

@petrjasek petrjasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove all those extra and duplicated tests, seems like lots of code is duplicated in those two test files, not sure if we need parser specific one if it checks the parsing in the other file. there is no value in testing something 2x, just more work to maintain it

def _ensure_guid(self, item: Dict[str, Any]) -> str:
"""Generate GUIDs with a TT-specific namespace while respecting existing URNs."""
base_guid = super()._ensure_guid(item)
tt_prefix = "urn:newsml:stt.fi:stt_tt_content_api:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo the prefix should be like urn:tt.fi: it's not really STT content is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated

page_url = urlunparse(parsed._replace(query=urlencode(qs, doseq=True)))

# Use base class HTTP retry infrastructure
response = self._get_with_retry(page_url, headers=headers, timeout=300)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout 300 is imo a bit too much, or is the api that slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change timeout to 60

logger.info("TT API: total from first page = %s", total)

batch = self._extract_tt_items_from_response(data)
logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's rather for .debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated

@minhpham212
Copy link
Contributor Author

@petrjasek pls help me review

@minhpham212
Copy link
Contributor Author

@petrjasek pls check again

@minhpham212
Copy link
Contributor Author

I updated this code , follow review, pls help me check

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch 2 times, most recently from 97187c0 to 4cda417 Compare October 3, 2025 03:50
),
},
{
"id": "use_trs",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know why is this configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can enable or disable this configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why? I don't see it as a requirement and it complicates things

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch from d6d3bcd to 4aad873 Compare October 6, 2025 01:44
@minhpham212
Copy link
Contributor Author

@petrjasek, please check this configuration library issue
The conflict is caused by:
The user requested tzlocal==2.1
superdesk-core 3.2.0.dev0 depends on tzlocal>=5.

@petrjasek
Copy link
Member

I've merged the async version in so the tests should be working now

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch 2 times, most recently from ea2c88b to 4d67f9d Compare October 8, 2025 16:39
@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch from 4d67f9d to 68a0ba7 Compare October 9, 2025 06:03
@minhpham212
Copy link
Contributor Author

@petrjasek pls help me check again.

@petrjasek
Copy link
Member

I checked it, would still remove that use_trs config and use it always

@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch 2 times, most recently from ca8f67d to b54accd Compare October 10, 2025 17:43
@minhpham212 minhpham212 force-pushed the feature/STT-1218-Feeding-service-for-TT-Content-API branch from b54accd to e06a417 Compare October 11, 2025 14:21
@minhpham212
Copy link
Contributor Author

@petrjasek I remove use_trs

still synchronous, so we offload it via asyncio.to_thread.
"""
# Offload sync fetch to a worker thread to avoid blocking the event loop
json_items = await asyncio.to_thread(self._fetch_tt_data, provider, update)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if that's useful here, would be probably better to just use https://docs.aiohttp.org/en/stable/ for fetching, the rest can run as usual

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

qs = {**base_qs, "s": str(page_size), "fr": str(offset)}
if trs_value:
qs["trs"] = trs_value
page_url = urlunparse(parsed._replace(query=urlencode(qs, doseq=True)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why doing this urlencode urlunparse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Contributor Author

@minhpham212 minhpham212 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrjasek Please help me check the CI/CD issue.

)
else:
# No more results
return [it for it in items if isinstance(it, dict)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it needs another iteration if you only add dicts to items

# No more results
return [it for it in items if isinstance(it, dict)]
break
except Exception as ex:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only rerun on that aiohttp.ClientResponseError right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

offset += page_size
if isinstance(total, int) and offset >= total:
break
return [it for it in items if isinstance(it, dict)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue like on line 330

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@petrjasek petrjasek merged commit 2ad2b94 into superdesk:develop Oct 14, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants