This project consists of a handful of scripts that support usage of the static analysis capabilities of HHVM within a PHP codebase. The scripts ultimately produce a formatted report of errors discovered in the analysis, but are also used to prepare the codebase and to directly compare reports.
- Clone this repository:
git clone [email protected]:wayfair/hussar
- Edit
config.example
with your project's details and save asconfig
. - If necessary for your project, add a file named
local-patch
to override the default functions provided inlib/patching
. cd
to the root of your PHP codebase- Run
/path/to/hussar/bin/main -F -r master
- Run
less /path/to/hussar/reports/latest
to view the report
Note:
This project requires
hhvm <= 3.3.x
. The-t analyze
flag is not present in more recent versions of HHVM and there does not appear to be equivalent functionality for analyzing PHP code (see issue-3987).
hussar
was developed in a GNU/Linux environment and uses many standard
utilities that we do not list here (eg, find
, grep
, rsync
, mkfifo
etc).
The following additional packages are required to use the tool:
hhvm <= 3.3.x
(see note, above)bash >= 4.0
(for associative array support)git
(not required for all scripts, see Usage section below)php
(for building the.class-index
and parsing HHVM output)perl
(for patching files in place)
PHP projects that are unable to run directly on HHVM can still take advantage of identifying errors found by its strong typing and static analysis of code. Unfortunately the output provided by HHVM isn't so friendly, and some prep work is required before analysis can be performed at all since the static analyzer lacks support for some common PHP functionality (eg, autoloading). The purpose of this project is to facilitate use of the static analysis capabilities of HHVM so that it can be included as part of a regular test, build, and deploy cycle.
The scripts under bin/
are designed to be able to run independently, with the
bin/main
script wrapping the functionality of the others. For typical PHP
projects using git
, try the following commands:
# Commands expect to run within the root of the codebase
$ cd /path/to/codebase
# Analyze all files in the codebase
$ /path/to/hussar/bin/main -F
# View the latest full report
$ less /path/to/hussar/reports/latest
Projects are likely to have a number of errors in their codebase and developers shouldn't have to see the full list every time. The tool can restrict output to just those errors that might be introduced by a developer's changes:
# Analyze for errors in a patchfile (compared to origin/master)
$ /path/to/hussar/bin/main -p /path/to/file.patch
# Analyze for errors in a branch (compared to merge-base with origin/master)
$ /path/to/hussar/bin/main -r feature/branch -t 'fork-point'
Options can be combined to handle more complex cases:
# Analyze file.patch applied to feature/branch and compare to origin/master:
$ /path/to/hussar/bin/main -p /path/to/file.patch -r feature/branch
# As above, but compare to feature/base instead:
$ /path/to/bin/main -p /path/to/file.patch -r feature/branch -t feature/base
For these more complex uses, it can be helpful to enable debugging output:
# Enable debugging messages
$ /path/to/hussar/bin/main -d -p /path/to/file.patch
Projects not using git
can still take advantage of the tool. Patch the
codebase manually (see next section) and run the bin/make-report
command:
# Make report for default branch
$ hg up default
$ /path/to/hussar/bin/make-report -p -r $(hg id)
The bin/make-diff-report
script should easily translate to other source
control systems but is not currently supported outside git
. The other scripts
are more tightly coupled to git
commands, but it should be possible to
translate them as well.
Being static analysis, HHVM requires a few codebase modifications to support
some standard PHP features and make the results of the analysis meaningful.
Applying these modifications is the responsibility of the bin/patch-files
script, which modifies files to correct for the following limitations:
-
Autoprepend and Autoloading:
The
auto_prepend_file
directive is not supported when runninghhvm
with the-t analyze
flag (see issue-2990). Autoloading is also unsupported (see issue-253) and classes meant to be included viaspl_autoload_register
orhh\autoload_set_paths
are instead reported asUnknownClass
. These limitations result in a cascade of false-positive errors for any files that depend on those features. To resolve these issues, the script inserts hardrequire_once
statements into every file, using a regex to identify what classes a file contains. -
PHP incompatibilities and third-party packages:
The HHVM team has done a great job reimplementing PHP functionality, but there are still a number of incompatibilities beyond the two described above, and often projects use third-party code that can clutter reports. When present, the script adds
require_once
statements for all files under thestubs/
directory to theauto_prepend.php
script so that these objects are always defined. -
Runtime defined state:
The whole point of static analysis is to discover errors without executing the program, so it's no surprise that HHVM emits some false-positive errors for states known only at runtime. Constants whose values are defined within conditional expressions produce
DeclaredConstantTwice
errors when in reality only one branch would be taken at runtime. HHVM does not store the value of conditionally defined constants, so concatenating such constants with strings (as inrequire_once PROJECT_ROOT . '/path/to/file.php';
) results inUnknownClass
orUndefinedFunction
errors. The script can be configured to patch such constants to static string values.
Patching and analyzing a large codebase can take several minutes, a delay that
discourages developers from using the tool to avoid interrupting their workflow.
While hussar
can't do much to improve the run time of hhvm
, it employs a few
caches to reduce the time required to patch files. Both caches are created by
default during the first invocation of hussar
.
The first cache is the .class-index
, which is a simple mapping from a fully
qualified class name to its corresponding file. This is encoded as a Bash
associative array and sourced before patching files. Without this cache, class
files must be "autoloaded" at patch time, which is very expensive because it
requires spinning up a php
process for every file to be patched.
# Make the .class-index file
$ /path/to/hussar/bin/main -c
The second cache is the patch cache, which is a full copy of the codebase after
having patched all files. This cache is built via rsync
and cuts the time
required to patch files down to a few seconds by reducing the number of files to
be patched to just those changed or introduced in a revision or patchfile.
# Fully patch the codebase and update the patch cache to match
$ /path/to/bin/main -F -u
As described in the Usage section above, hussar
tries to be flexible with
what can be analyzed. The bin/prepare-workspace
and bin/patch-files
scripts
work together to ensure that the report is actually performed against the
specified revisions. Getting this right is tricky and requires keeping track of
the hashes for origin/master
, the patch cache, the revision, the comparand,
and the patchfile (modulo which options are present).
The former script maintains the workspace as necessary by resetting HEAD and
pulling down the patch cache, then replacing the cached versions of files with
their equivalent version in the specified revision or patchfile. Patchfiles are
then committed so that bin/make-report
can save the report under the new
revision. If the patch cache is ahead of the specified revision, files not
present in the revision are removed. If the patch cache is behind, files added
in the revision are checked out to the index as new files. Upon completion, all
files not modified by the revision or patchfile are modified in the working tree
to their patch cache equivalents; that is, the workspace is identical to the
revision (or patchfile applied to the revision) but with patched versions of
nearly all files.
The latter script patches any files deemed necessary to patch (as described
above in Limitations of HHVM static analysis). Passing the -F
option
specifies all files in the codebase. When a revision is specified, only the
files modified between HEAD
and the revision are patched. This is precisely
where bin/prepare-workspace
leaves off, so that by the end of these two
scripts all files in the codebase have been patched while otherwise reflecting
their contents at the specified revision (plus patchfile).
hussar
is distributed with an ISC License. See LICENSE for details.