Conversation
- 在 clean_kubejs_from_raw_impl 中,pending_en 寫入前新增 reverse_index dedup - 若某英文文字(value)已出現在 final/zh_tw.json(不同 key),則跳過不寫入 pending - 避免同一英文原文因不同 key 而重複翻譯
- 新增 ColorCharError dataclass,記錄非法顏色字元錯誤 - 實作 COLOR_PATTERN 正則:& 後只能接 a-v(不含 w)、0-9、空格、\、# - check_color_chars():檢查單一字串中的非法顏色字元 - check_json_file():讀取 JSON 並遞迴檢查所有字串值 - check_directory():遞迴檢查目錄下所有 .json 檔 - 遵循現有 checkers Generator yield 模式
Greptile SummaryThis PR introduces a new Key issues found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([check_directory]) -->|os.walk .json files| B[check_json_file]
B -->|json.load| C{root is dict?}
C -- Yes --> D[_check_value per key]
C -- No --> E([silently skipped])
D -->|str| F[check_color_chars\nregex: &[^a-v0-9\\s\\\\#]]
F --> G{match found?}
G -- Yes --> H[yield ColorCharError\nfile_path='', key='']
G -- No --> I([return None])
H --> J[_check_value patches\nfile_path & key]
D -->|dict| K[recurse with child key\nparent key lost]
D -->|list| L[recurse with key[i]\nparent path preserved]
subgraph kubejs_translator_clean [kubejs_translator_clean — reverse_index dedup]
M([for each group]) --> N{pending_en\n& final_root exists?}
N -- Yes --> O[rglob zh_tw.json\n⚠️ repeated per group]
O --> P[build final_tw_lookup]
P --> Q[build reverse_index\nvalue → keys]
Q --> R[filter pending_en:\ndrop if v in reverse_index\n& k != reverse_index[v][0]\n⚠️ cross-namespace key compare]
R --> S([write en_us.json])
end
Reviews (1): Last reviewed commit: "style: apply ruff format" | Re-trigger Greptile |
| if pending_en and final_root_p.exists(): | ||
| # 從 final/zh_tw.json 建立 final_tw_lookup(key → 原文) | ||
| final_tw_lookup: dict[str, str] = {} | ||
| for tw_file in final_root_p.rglob("zh_tw.json"): | ||
| tw_data = read_json_dict_fn(tw_file) | ||
| if tw_data: | ||
| final_tw_lookup.update(tw_data) | ||
|
|
||
| if final_tw_lookup: | ||
| # 建立 reverse_index(英文文字 → 對應 key 列表) | ||
| reverse_index: dict[str, list[str]] = {} | ||
| for k, v in final_tw_lookup.items(): | ||
| if is_filled_text_impl(v): | ||
| reverse_index.setdefault(v, []).append(k) | ||
|
|
||
| # 過濾 pending_en:跳過那些「英文文字已存在於 final」的 key | ||
| pending_en = { | ||
| k: v | ||
| for k, v in pending_en.items() | ||
| if not ( | ||
| is_filled_text_impl(v) | ||
| and v in reverse_index | ||
| and k != reverse_index[v][0] | ||
| ) | ||
| } |
There was a problem hiding this comment.
final_tw_lookup rebuilt on every group iteration
The entire final_root_p.rglob("zh_tw.json") scan and reverse_index construction happen inside the for group_dir, files_map in groups.items() loop (line 185). For N groups this becomes O(N × number-of-zh_tw-files) — all zh_tw.json files are read and merged from scratch on every group, even though the final directory does not change between iterations.
Move both final_tw_lookup and reverse_index outside the loop so they are built once:
# Build once before the loop
final_tw_lookup: dict[str, str] = {}
if final_root_p.exists():
for tw_file in final_root_p.rglob("zh_tw.json"):
tw_data = read_json_dict_fn(tw_file)
if tw_data:
final_tw_lookup.update(tw_data)
reverse_index: dict[str, list[str]] = {}
for k, v in final_tw_lookup.items():
if is_filled_text_impl(v):
reverse_index.setdefault(v, []).append(k)
for group_dir, files_map in groups.items():
...
if pending_en and reverse_index:
pending_en = {
k: v
for k, v in pending_en.items()
if not (
is_filled_text_impl(v)
and v in reverse_index
and k != reverse_index[v][0]
)
}| # 過濾 pending_en:跳過那些「英文文字已存在於 final」的 key | ||
| pending_en = { | ||
| k: v | ||
| for k, v in pending_en.items() | ||
| if not ( | ||
| is_filled_text_impl(v) | ||
| and v in reverse_index | ||
| and k != reverse_index[v][0] | ||
| ) | ||
| } |
There was a problem hiding this comment.
Cross-namespace key comparison may silently drop entries
reverse_index is keyed on values from final_tw_lookup (zh_tw translations), and its lists contain keys from the final directory (e.g. "item.mymod.foo"). The dedup filter checks k != reverse_index[v][0], where k is a key from the raw/pending source file — a completely different namespace.
If pending_en has {"item.othermod.bar": "Hello World"} and reverse_index["Hello World"] == ["item.mymod.foo"], the condition becomes "item.othermod.bar" != "item.mymod.foo" → True, so the entry is filtered out. The intended behaviour — keeping the entry when the same key already exists in the final output — is therefore never triggered for keys that don't coincidentally match a final-directory key name.
In practice this means any pending entry whose English value also appears anywhere in final_tw_lookup (regardless of key) will be dropped, which is likely more aggressive deduplication than intended. You may want to check for key equality within the same key namespace, or reconsider the dedup condition entirely.
| # 核心檢查:& 後只能接 a-v(不含 w)、0-9、空格、\、# | ||
| # 合法字元:a-v, 0-9, whitespace, backslash, hash | ||
| # 違法:& 後面接了 a-v 與 0-9、空格、\、# 以外的任何字元 | ||
| COLOR_PATTERN = re.compile(r"&([^a-v0-9\s\\#])") |
There was a problem hiding this comment.
COLOR_PATTERN whitelist is too broad — misses many invalid codes
The character class [^a-v0-9\s\\#] treats every letter a–v as a legal format code. The actual valid Minecraft format characters are:
| Category | Characters |
|---|---|
| Color | 0–9, a–f |
| Formatting | k, l, m, n, o, r |
| Hex prefix (Paper/Spigot) | x |
Characters such as g, h, i, j, p, q, s, t, u, v are all inside the a-v range and will not be flagged as illegal — even though &g, &p, &v, etc. have no defined meaning in vanilla Minecraft and will render incorrectly.
If the intent is strict validation:
# Legal: 0-9, a-f (colour), k-o + r (format), optional x (hex), whitespace, \, #
COLOR_PATTERN = re.compile(r"&([^0-9a-fk-orx\s\\#])", re.IGNORECASE)If some non-standard mod codes (e.g. g–j) must be kept, the allowed set should be explicitly documented.
| def check_color_chars(value: str) -> list[ColorCharError] | None: | ||
| """檢查單一字串中的非法顏色字元。 | ||
|
|
||
| Args: | ||
| value: 要檢查的字串。 | ||
|
|
||
| Returns: | ||
| 錯誤列表(若無錯誤則回傳 None)。 | ||
| """ | ||
| errors: list[ColorCharError] = [] | ||
| for match in COLOR_PATTERN.finditer(value): | ||
| illegal_char = match.group(1) | ||
| pos = match.start() | ||
| errors.append( | ||
| ColorCharError( | ||
| file_path="", | ||
| key="", | ||
| value=value, | ||
| illegal_char=illegal_char, | ||
| position=pos, | ||
| message=f"在位置 {pos} 發現非法顏色字元 '&{illegal_char}'," | ||
| f"& 後只能接 a-v(不含 w)、0-9、空格、\\、#。", | ||
| ) | ||
| ) | ||
| return errors if errors else None |
There was a problem hiding this comment.
Inconsistent return type (
list | None) instead of list
check_color_chars returns None on success and a non-empty list on failure. Returning an empty list on success is more Pythonic and avoids callers needing a None guard — it also makes the declared return type truthful (right now the annotation says list[ColorCharError] | None but an empty list is never actually returned).
| def check_color_chars(value: str) -> list[ColorCharError] | None: | |
| """檢查單一字串中的非法顏色字元。 | |
| Args: | |
| value: 要檢查的字串。 | |
| Returns: | |
| 錯誤列表(若無錯誤則回傳 None)。 | |
| """ | |
| errors: list[ColorCharError] = [] | |
| for match in COLOR_PATTERN.finditer(value): | |
| illegal_char = match.group(1) | |
| pos = match.start() | |
| errors.append( | |
| ColorCharError( | |
| file_path="", | |
| key="", | |
| value=value, | |
| illegal_char=illegal_char, | |
| position=pos, | |
| message=f"在位置 {pos} 發現非法顏色字元 '&{illegal_char}'," | |
| f"& 後只能接 a-v(不含 w)、0-9、空格、\\、#。", | |
| ) | |
| ) | |
| return errors if errors else None | |
| def check_color_chars(value: str) -> list[ColorCharError]: | |
| """檢查單一字串中的非法顏色字元。 | |
| Args: | |
| value: 要檢查的字串。 | |
| Returns: | |
| 錯誤列表(若無錯誤則回傳空列表)。 | |
| """ | |
| errors: list[ColorCharError] = [] | |
| for match in COLOR_PATTERN.finditer(value): | |
| illegal_char = match.group(1) | |
| pos = match.start() | |
| errors.append( | |
| ColorCharError( | |
| file_path="", | |
| key="", | |
| value=value, | |
| illegal_char=illegal_char, | |
| position=pos, | |
| message=f"在位置 {pos} 發現非法顏色字元 '&{illegal_char}'," | |
| f"& 後只能接 a-v(不含 w)、0-9、空格、\\、#。", | |
| ) | |
| ) | |
| return errors |
The caller _check_value can then simply do if errors := check_color_chars(value):.
| elif isinstance(value, dict): | ||
| for k, v in value.items(): | ||
| yield from _check_value(file_path, k, v) |
There was a problem hiding this comment.
Nested-dict path context is lost
When value is a dict, the function recurses with only the child key k, discarding the parent entirely. For a nested structure like {"parent": {"child": "bad&Wvalue"}}, the resulting ColorCharError will report the leaf key alone (child) instead of the full dotted path (parent.child). By contrast, the list branch correctly preserves context with f"{key}[{i}]".
Consider building a dotted path for dict keys to keep errors self-describing:
| elif isinstance(value, dict): | |
| for k, v in value.items(): | |
| yield from _check_value(file_path, k, v) | |
| elif isinstance(value, dict): | |
| for k, v in value.items(): | |
| nested_key = f"{key}.{k}" if key else k | |
| yield from _check_value(file_path, nested_key, v) |
7806250 to
8da65a6
Compare
實作摘要
改了什麼
為什麼要改
怎麼驗證的
檔案變更