Skip to content

Commit a26c5c9

Browse files
authored
Command line tools for XML sync testing between languages (#222)
* Command line tool for XML sync testing between languages: tags, revtag, PI, ws * Modernize array() usage. * Review
1 parent 3ab4fe9 commit a26c5c9

21 files changed

+653
-108
lines changed

Diff for: scripts/translation/README.md

+101-54
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,19 @@ Because of the above, it's possible to silence each alert indempendly. These
1616
scripts will output `--add-ignore` commands that, if executed, will omit the
1717
specific alerts in future executions.
1818

19-
## First execution
19+
## broken.php
2020

21-
The first execution of these scripts may generate an inordinate amount of
22-
alerts. It's advised to initially run each command separately, and work the
23-
alerts on a case by case basis. After all interesting cases are fixed,
24-
it's possible to rerun the command and `grep` the output for `--add-ignore`
25-
lines, run these commands, and by so, mass ignore the residual alerts.
21+
`doc-base/scripts/broken.php` will test if individual XML files are
22+
ill-formed. That is, if a file contains Unicode BOM, carriage returns (CR),
23+
or if XML contents are not
24+
[well-balanced](https://www.w3.org/TR/xml-fragment/#defn-well-balanced).
25+
26+
Unbalanced XML contents are invalid XML and will result in a broken build.
27+
BOM and CR marks may not result in broken builds, but *will* cause several
28+
tools below to misbehave, as `libxml` behaviour changes if XML text contains
29+
these bytes.
2630

27-
## qaxml-attributes.php (structural)
31+
## qaxml-attributes.php
2832

2933
`doc-base/scripts/translation/qaxml-attributes.php` checks if all translated
3034
files have the same tag-attribute-value triplets. Tag's attributes are
@@ -35,7 +39,7 @@ This script accepts an `--urgent` option, to filter alerts related to `xml:id`
3539
attributes. This will help translators on languages that are failing to build,
3640
to focus on mismatches that are probably most related with build fails.
3741

38-
## qaxml-entities.php (structural)
42+
## qaxml-entities.php
3943

4044
`doc-base/scripts/translation/qaxml-entities.php` checks if all translated
4145
files contain the same XML Entities References as the original files.
@@ -55,15 +59,99 @@ entities when generating alerts. This is handy in languages that use some
5559
`&zb;` and `&dh;` entities, and could run with `-zb -dh` to avoid generating
5660
alerts for these entities' differences.
5761

58-
## Old tools (below)
62+
## qaxml-pi.php
63+
64+
`doc-base/scripts/translation/qaxml-pi.php` checks if all translated files have
65+
the same processing instructions (PI) as the original files. Unbalanced PIs may
66+
cause compilation errors, as they are utilized in the manual build process.
67+
68+
## qaxml-tags.php
69+
70+
`doc-base/scripts/translation/qaxml-tags.php` checks if all translated files
71+
have the same tags as the original files. Different number of tags between
72+
source texts and translations indicated mismatched translated texts, and may
73+
cause compilation errors
74+
75+
This script accepts an `--detail` option, that will print lines of each
76+
mismatched tag, to facilitate the work on big files.
77+
78+
This script also accepts an `--content=` option, that will check the
79+
*contents* of tags, to inspect tags where the contents are expected *not* to
80+
be translated. Example below.
81+
82+
## qaxml-ws.php
83+
84+
`doc-base/scripts/translation/qaxml-ws.php` inspect whitespace usage inside
85+
some known tags. Spurious whitespace may break manual linking or generate
86+
visible artifacts.
87+
88+
## qaxml-revtag.php
89+
90+
`doc-base/scripts/translation/qaxml-revtag.php` checks if all translated
91+
files have valid [revision tags](https://doc.php.net/guide/translating.md).
92+
Files without revision tags in expected format will fail to generate pretty
93+
diffs on [Translation status](https://doc.php.net/revcheck.php) website or
94+
locally generated `revcheck.php` status pages.
95+
96+
## Suggested execution
97+
98+
The first execution of these scripts may generate an inordinate amount of
99+
alerts. It's advised to initially run each command separately, and work the
100+
alerts on a case by case basis. After all interesting cases are fixed,
101+
it's possible to rerun the command and `grep` the output for `--add-ignore`
102+
lines, run these commands, and by so, mass ignore the residual alerts.
103+
104+
Structural checks:
105+
106+
```
107+
php doc-base/scripts/broken.php
108+
php doc-base/scripts/translation/qaxml-revtag.php
109+
110+
php doc-base/scripts/translation/qaxml-attributes.php
111+
php doc-base/scripts/translation/qaxml-entities.php
112+
php doc-base/scripts/translation/qaxml-pi.php
113+
php doc-base/scripts/translation/qaxml-tags.php --detail
114+
php doc-base/scripts/translation/qaxml-ws.php
115+
```
59116

60-
The tools on `doc-base/scripts/translation/` are slowly being rewritten. While
61-
this effort is not complete, the previous tools, document below, could be used
62-
to supply for features yet not completed.
117+
Tags where is expected no translations:
118+
119+
```
120+
php doc-base/scripts/translation/qaxml-tags.php --content=acronym
121+
php doc-base/scripts/translation/qaxml-tags.php --content=classname
122+
php doc-base/scripts/translation/qaxml-tags.php --content=constant
123+
php doc-base/scripts/translation/qaxml-tags.php --content=envar
124+
php doc-base/scripts/translation/qaxml-tags.php --content=function
125+
php doc-base/scripts/translation/qaxml-tags.php --content=interfacename
126+
php doc-base/scripts/translation/qaxml-tags.php --content=parameter
127+
php doc-base/scripts/translation/qaxml-tags.php --content=type
128+
php doc-base/scripts/translation/qaxml-tags.php --content=classsynopsis
129+
php doc-base/scripts/translation/qaxml-tags.php --content=constructorsynopsis
130+
php doc-base/scripts/translation/qaxml-tags.php --content=destructorsynopsis
131+
php doc-base/scripts/translation/qaxml-tags.php --content=fieldsynopsis
132+
php doc-base/scripts/translation/qaxml-tags.php --content=funcsynopsis
133+
php doc-base/scripts/translation/qaxml-tags.php --content=methodsynopsis
134+
```
135+
136+
Tags where is expected few translations:
137+
138+
```
139+
php doc-base/scripts/translation/qaxml-tags.php --content=code
140+
php doc-base/scripts/translation/qaxml-tags.php --content=computeroutput
141+
php doc-base/scripts/translation/qaxml-tags.php --content=filename
142+
php doc-base/scripts/translation/qaxml-tags.php --content=literal
143+
php doc-base/scripts/translation/qaxml-tags.php --content=varname
144+
```
63145

64146
---
65147

66-
Before using the old scripts, they need be configured:
148+
## Old tools (below)
149+
150+
Document below is the previous version of these tools. These tools are
151+
deprecated, and scheduled for remotion very soon.
152+
153+
154+
These old tools needed to be separated configured, before use:
67155
```
68156
php doc-base/scripts/translation/configure.php $LANG_DIR
69157
```
@@ -107,44 +195,3 @@ contents, as some tag contents are expected *not* be translated.
107195

108196
`--detail` will also print line definitions of each mismatched tag,
109197
to facilitate bitsecting.
110-
111-
## Suggested execution
112-
113-
Structural checks:
114-
115-
```
116-
php doc-base/scripts/translation/configure.php $LANG_DIR
117-
118-
php doc-base/scripts/translation/qarvt.php
119-
120-
php doc-base/scripts/translation/qaxml.a.php
121-
php doc-base/scripts/translation/qaxml.e.php
122-
php doc-base/scripts/translation/qaxml.p.php
123-
php doc-base/scripts/translation/qaxml.t.php
124-
php doc-base/scripts/translation/qaxml.w.php
125-
```
126-
Tags where is expected no translations:
127-
```
128-
php doc-base/scripts/translation/qaxml.t.php acronym
129-
php doc-base/scripts/translation/qaxml.t.php classname
130-
php doc-base/scripts/translation/qaxml.t.php constant
131-
php doc-base/scripts/translation/qaxml.t.php envar
132-
php doc-base/scripts/translation/qaxml.t.php function
133-
php doc-base/scripts/translation/qaxml.t.php interfacename
134-
php doc-base/scripts/translation/qaxml.t.php parameter
135-
php doc-base/scripts/translation/qaxml.t.php type
136-
php doc-base/scripts/translation/qaxml.t.php classsynopsis
137-
php doc-base/scripts/translation/qaxml.t.php constructorsynopsis
138-
php doc-base/scripts/translation/qaxml.t.php destructorsynopsis
139-
php doc-base/scripts/translation/qaxml.t.php fieldsynopsis
140-
php doc-base/scripts/translation/qaxml.t.php funcsynopsis
141-
php doc-base/scripts/translation/qaxml.t.php methodsynopsis
142-
```
143-
Tags where is expected few translations:
144-
```
145-
php doc-base/scripts/translation/qaxml.t.php code
146-
php doc-base/scripts/translation/qaxml.t.php computeroutput
147-
php doc-base/scripts/translation/qaxml.t.php filename
148-
php doc-base/scripts/translation/qaxml.t.php literal
149-
php doc-base/scripts/translation/qaxml.t.php varname
150-
```

Diff for: scripts/translation/lib/OutputIgnoreArgv.php

+2-2
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function __construct( array & $argv )
3434
if ( str_starts_with( $arg , "--add-ignore=" ) )
3535
{
3636
$file = OutputIgnoreArgv::cacheFile();
37-
$list = $file->load( array() );
37+
$list = $file->load( [] );
3838
$line = substr( $arg , 13 );
3939
if ( ! in_array( $line , $list ) )
4040
{
@@ -47,7 +47,7 @@ function __construct( array & $argv )
4747
if ( str_starts_with( $arg , "--del-ignore=" ) )
4848
{
4949
$file = OutputIgnoreArgv::cacheFile();
50-
$list = $file->load( array() );
50+
$list = $file->load( [] );
5151
$line = substr( $arg , 13 );
5252
$dels = 0;
5353
while ( in_array( $line , $list ) )

Diff for: scripts/translation/lib/OutputIgnoreBuffer.php

+7-7
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ class OutputIgnoreBuffer
2525

2626
private string $filename = "";
2727
private string $header = "";
28-
private array $matter = array();
29-
private array $footer = array();
28+
private array $matter = [];
29+
private array $footer = [];
3030

3131
private OutputIgnoreArgv $args;
3232

@@ -82,14 +82,14 @@ function print( bool $useAlternatePrinting = false )
8282

8383
$markhead = $this->filename . ':' . $this->hash( false ) . ':';
8484
$markfull = $markhead . $this->hash( true );
85-
$marks = OutputIgnoreArgv::cacheFile()->load( array() );
85+
$marks = OutputIgnoreArgv::cacheFile()->load( [] );
8686

8787
if ( $this->args->showIgnore )
8888
{
8989
// --add-ignore
9090

9191
if ( in_array( $markfull , $marks ) )
92-
$this->matter = array();
92+
$this->matter = [];
9393
else
9494
$this->args->pushAddIgnore( $this , $markfull );
9595

@@ -132,9 +132,9 @@ function print( bool $useAlternatePrinting = false )
132132

133133
private function printMatterAlternate() : void
134134
{
135-
$add = array();
136-
$del = array();
137-
$rst = array();
135+
$add = [];
136+
$del = [];
137+
$rst = [];
138138

139139
foreach( $this->matter as $text )
140140
{

Diff for: scripts/translation/lib/RevcheckData.php

+3-3
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ class RevcheckData
3232
public string $lang = "";
3333
public string $date = "";
3434
public string $intro = "";
35-
public $translators = array(); // nick => RevcheckDataTranslator
36-
public $fileSummary = array(); // RevcheckStatus => int
37-
public $fileDetail = array(); // filename => RevcheckDataFile
35+
public $translators = []; // nick => RevcheckDataTranslator
36+
public $fileSummary = []; // RevcheckStatus => int
37+
public $fileDetail = []; // filename => RevcheckDataFile
3838

3939
public function __construct()
4040
{

Diff for: scripts/translation/lib/RevcheckFileList.php

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
class RevcheckFileList
2323
{
24-
private $list = array();
24+
private $list = [];
2525

2626
function __construct( $lang )
2727
{

Diff for: scripts/translation/lib/RevtagParser.php

+2-2
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ public static function parseComment( DOMNode $node , RevtagInfo $ret , $filename
7070
// /EN-Revision:\s*(\S+)\s*Maintainer:\s*(\S+)\s*Status:\s*(\S+)/ // restrict maintainer without spaces
7171
// /EN-Revision:\s*(\S+)\s*Maintainer:\s(.*?)\sStatus:\s*(\S+)/ // accepts maintainer with spaces
7272

73-
$match = array();
73+
$match = [];
7474
$regex = "/EN-Revision:\s*(\S+)\s*Maintainer:\s(.*?)\sStatus:\s*(\S+)/";
7575
if ( preg_match( $regex , $text , $match ) )
7676
{
@@ -91,7 +91,7 @@ public static function parseComment( DOMNode $node , RevtagInfo $ret , $filename
9191

9292
if ( str_starts_with( $text , "CREDITS:" ) )
9393
{
94-
$match = array();
94+
$match = [];
9595
$regex = "/CREDITS:(.*)/";
9696
if ( preg_match( $regex , $text , $match ) )
9797
{

Diff for: scripts/translation/lib/XmlUtil.php

+2-2
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ public static function extractEntities( $filename )
3434
libxml_clear_errors();
3535
libxml_use_internal_errors( $was );
3636

37-
$ret = array();
37+
$ret = [];
3838
foreach ($errors as $error)
3939
{
4040
if ( preg_match( "/Entity '(\S+)' not defined/" , $error->message , $matches ) )
@@ -45,7 +45,7 @@ public static function extractEntities( $filename )
4545

4646
public static function listNodeType( DOMNode $node , int $type )
4747
{
48-
$ret = array();
48+
$ret = [];
4949
XmlUtil::listNodeTypeRecurse( $node , $type , $ret );
5050
return $ret;
5151
}

Diff for: scripts/translation/libqa/ArgvParser.php

+3-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ class ArgvParser
2626
public function __construct( array $argv )
2727
{
2828
$this->argv = array_values( array_filter( $argv ) );
29-
$this->used = [];
3029
$this->used = array_fill( 0 , count( $argv ) , false );
3130
}
3231

@@ -58,6 +57,9 @@ public function consume( string $equals = null , string $prefix = null , int $po
5857
$this->argv[ $pos ] = null;
5958
$this->used[ $pos ] = true;
6059

60+
if ( $foundByPrefix )
61+
return substr( $arg , strlen( $prefix ) );
62+
6163
return $arg;
6264
}
6365
}

Diff for: scripts/translation/libqa/OutputBuffer.php

+10-6
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ class OutputBuffer
2828
private OutputIgnore $ignore;
2929
private string $options;
3030

31+
public int $printCount = 0;
32+
3133
public function __construct( string $header , string $filename , OutputIgnore $ignore )
3234
{
3335
$filename = str_replace( "/./" , "/" , $filename );
@@ -81,7 +83,7 @@ public function contains( string $text ) : bool
8183
return false;
8284
}
8385

84-
public function print( bool $useAlternatePrinting = false )
86+
public function print( bool $alternatePrinting = false )
8587
{
8688
if ( count( $this->matter ) == 0 && count( $this->footer ) == 0 )
8789
return;
@@ -93,9 +95,11 @@ public function print( bool $useAlternatePrinting = false )
9395
if ( $this->ignore->shouldIgnore( $this , $hashFile , $hashHead , $hashFull ) )
9496
return;
9597

98+
$this->printCount++;
99+
96100
print $this->header;
97101

98-
if ( $useAlternatePrinting )
102+
if ( $alternatePrinting )
99103
$this->printMatterAlternate();
100104
else
101105
foreach( $this->matter as $text )
@@ -115,9 +119,9 @@ public function print( bool $useAlternatePrinting = false )
115119

116120
private function printMatterAlternate() : void
117121
{
118-
$add = array();
119-
$del = array();
120-
$rst = array();
122+
$add = [];
123+
$del = [];
124+
$rst = [];
121125

122126
foreach( $this->matter as $text )
123127
{
@@ -128,8 +132,8 @@ private function printMatterAlternate() : void
128132

129133
for ( $idx = 0 ; $idx < count( $this->matter ) ; $idx++ )
130134
{
131-
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
132135
if ( isset( $del[ $idx ] ) ) print $del[ $idx ];
136+
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
133137
}
134138

135139
foreach( $rst as $text )

0 commit comments

Comments
 (0)