Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 13, 2025

Problem

Special Administrative Regions like Hong Kong and Macau presented a data modeling challenge: they should be classified as states/provinces under their parent country (China), but they have unique attributes typically associated with countries—distinct phone codes, currencies, and flags. Previously, this created a dilemma:

  • Representing them as countries (IDs 98, 128) preserved their unique attributes but was geographically inaccurate
  • Representing them as states (IDs 2267, 2266) was accurate but lost important details like phone codes and currencies

Current Implementation (Under Review)

This PR currently extends the states table schema with 6 optional country-level fields specifically for Special Administrative Regions. However, based on maintainer feedback, this approach may be revised.

Current Schema Changes

Added 6 new optional fields to the states table:

phonecode       VARCHAR(255)  -- Phone dialing code (e.g., "852")
currency        VARCHAR(255)  -- Currency code (e.g., "HKD")
currency_name   VARCHAR(255)  -- Full currency name (e.g., "Hong Kong dollar")
currency_symbol VARCHAR(255)  -- Currency symbol (e.g., "$")
emoji           VARCHAR(191)  -- Flag emoji (e.g., "🇭🇰")
emojiU          VARCHAR(191)  -- Emoji Unicode (e.g., "U+1F1ED U+1F1F0")

These fields remain NULL for regular states/provinces and are only populated for SARs.

⚠️ Maintainer Feedback & Alternative Approach

Concern raised: Adding 6 columns for just 2 SARs creates unnecessary schema complexity.

Alternative proposed (see ALTERNATIVE_SOLUTION.md): Use a single sar_metadata JSON column instead of 6 separate columns:

-- Instead of 6 columns, use:
sar_metadata JSON DEFAULT NULL

Benefits of JSON approach:

  • ✅ Minimal schema impact (1 column vs 6)
  • ✅ No storage overhead for 5,071 regular states (NULL values)
  • ✅ Flexible and extensible without schema changes
  • ✅ Cleaner database design

Example with JSON field:

{
  "id": 2267,
  "name": "Hong Kong SAR",
  "country_id": 45,
  "sar_metadata": {
    "phonecode": "852",
    "currency": "HKD",
    "currency_name": "Hong Kong dollar",
    "currency_symbol": "$",
    "emoji": "🇭🇰",
    "emojiU": "U+1F1ED U+1F1F0"
  }
}

Status: Awaiting maintainer direction on preferred implementation approach before proceeding.

Data Example (Current Implementation)

Hong Kong SAR is currently represented as:

{
  "id": 2267,
  "name": "Hong Kong SAR",
  "country_id": 45,
  "country_code": "CN",
  "type": "special administrative region",
  "phonecode": "852",
  "currency": "HKD",
  "currency_name": "Hong Kong dollar",
  "currency_symbol": "$",
  "emoji": "🇭🇰",
  "emojiU": "U+1F1ED U+1F1F0"
}

Similarly for Macau SAR (phonecode: 853, currency: MOP, emoji: 🇲🇴).

Query Examples

The enhanced schema enables powerful queries:

-- Get all SARs (current approach)
SELECT * FROM states WHERE type = 'special administrative region';

-- Get all SARs (JSON approach)
SELECT * FROM states WHERE sar_metadata IS NOT NULL;

-- Get all China subdivisions (includes regular provinces and SARs)
SELECT * FROM states WHERE country_id = 45;

-- Get states with their own currencies
SELECT * FROM states WHERE currency IS NOT NULL;  -- Current approach
SELECT * FROM states WHERE JSON_EXTRACT(sar_metadata, '$.currency') IS NOT NULL;  -- JSON approach

Changes (Current Implementation)

  • Schema: Updated MySQL, Prisma, and SQL Server schemas with 6 SAR fields
  • Data: Populated Hong Kong SAR (2267) and Macau SAR (2266) with complete attributes
  • Exports: Modified JSON export command; CSV, XML, YAML, MongoDB inherit automatically
  • Documentation: Created comprehensive guides including technical reference, visual comparisons, usage examples, and alternative solution proposal
  • Validation: Added validation script to ensure data integrity

Benefits

Geographically accurate - SARs properly classified under parent countries
Data complete - No loss of important attributes
Backward compatible - Existing queries continue to work unchanged
Extensible - Can accommodate other autonomous territories (Åland Islands, Faroe Islands, Greenland, Puerto Rico, etc.)
Standards compliant - Follows ISO 3166-2 subdivision codes

Testing

  • ✅ Schema validated with MySQL 8.0
  • ✅ All validation tests pass (run python3 scripts/validate_sar.py)
  • ✅ Export commands tested and working
  • ✅ Backward compatibility verified

Documentation

Comprehensive documentation provided:

  • docs/SPECIAL_ADMINISTRATIVE_REGIONS.md - Technical guide and usage
  • docs/SAR_VISUALIZATION.md - Visual before/after comparisons
  • IMPLEMENTATION_SUMMARY.md - Complete implementation details
  • ALTERNATIVE_SOLUTION.md - NEW: Single JSON field alternative approach with detailed comparison

Breaking Changes

None - This is a fully backward-compatible extension:

  • Existing states unchanged (SAR fields are NULL)
  • All existing queries work without modification
  • No data migration required
  • Optional fields can be safely ignored by applications that don't need them

Next Steps

Awaiting maintainer decision on implementation approach:

  1. Single JSON column (recommended alternative)
  2. Current 6-column approach (if acceptable)
  3. Keep dual representation (SARs as both countries and states)
  4. Different solution (open to suggestions)

Fixes the issue raised about Hong Kong and Macau representation while providing a sustainable solution for similar cases worldwide.

Fixes #1025

Original prompt

This section details on the original issue you should resolve

<issue_title>Issue on Special Administrative Regions</issue_title>
<issue_description>SARs like Hong Kong and Macau should be a state or a city instead of country. But at the same time, it has a different phonecode, currency, currency_name, currency_symbol, emoji and emojiU.

So it is hard to be injected into state since it will lose the details.
Let me know if there are any better ways to solve it. Thanks.

P.S.: China does admit that Hong Kong is a part of China. So apart from the political voices, technically it is a part of China. Same as Macau.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #1025


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI changed the title [WIP] Fix representation of Special Administrative Regions Add Special Administrative Regions (SARs) support to states table Oct 13, 2025
Copilot AI requested a review from dr5hn October 13, 2025 13:42
@dr5hn
Copy link
Owner

dr5hn commented Oct 14, 2025

@copilot Doesn't look to be a good approach adding all extra columns just for couple of SAR's

Copy link
Contributor Author

Copilot AI commented Oct 14, 2025

@copilot Doesn't look to be a good approach adding all extra columns just for couple of SAR's

You're right - adding 6 columns for just 2 SARs is excessive. I can refactor to use a single JSON field instead:

Alternative approach:

  • Add one sar_metadata JSON column instead of 6 columns
  • Store phonecode, currency, emoji, etc. as JSON for SARs only
  • NULL for regular states (no storage overhead)

This reduces schema complexity while keeping the data flexible. Would you prefer:

  1. Single JSON field approach (minimal schema impact)
  2. Keep SARs as countries with documentation explaining the dual representation
  3. Different solution?

I can implement whichever approach you prefer.

Copilot AI changed the title Add Special Administrative Regions (SARs) support to states table [RFC] Add Special Administrative Regions (SARs) support to states table Oct 14, 2025
@dr5hn dr5hn closed this Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue on Special Administrative Regions

2 participants