Skip to content

Commit 93615ec

Browse files
Update ignore/config documentation
Signed-off-by: Ayan Sinha Mahapatra <asmahapatra@aboutcode.org>
1 parent 1c04c47 commit 93615ec

8 files changed

Lines changed: 221 additions & 191 deletions

File tree

azure-pipelines.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ jobs:
195195
python_architecture: x64
196196
test_suites:
197197
all:
198-
venv/bin/pip uninstall commoncode && venv/bin/pip install commoncode
198+
venv/bin/pip uninstall -y commoncode && venv/bin/pip install commoncode
199199
venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py --reruns 2
200200

201201

docs/source/reference/scancode-cli/cli-core-options.rst

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,98 @@ Comparing progress message options
145145
146146
This would scan the file ``samples/levelone/leveltwo/file`` but ignore
147147
``samples/levelone/leveltwo/levelthree/file``
148+
149+
----
150+
151+
.. _cli-ignore-option:
152+
153+
``--ignore <pattern>``
154+
----------------------
155+
156+
In a scan, all files inside the directory specified as an input argument is scanned. But if
157+
there are some files which you don't want to scan, the ``--ignore`` option can be used to do
158+
the same.
159+
160+
**Example**
161+
162+
.. code-block:: shell
163+
164+
scancode --ignore "*.java" samples samples.json
165+
166+
Here, ScanCode ignores files ending with `.java`, and continues with other files as usual.
167+
168+
More information on :ref:`glob-pattern-matching`.
169+
170+
----
171+
172+
.. _cli-config-option:
173+
174+
``--config-file <path>``
175+
------------------------
176+
177+
Path patterns which should be ignored in the scan can also be provided
178+
through a configuration file.
179+
180+
**Example**
181+
182+
.. code-block:: shell
183+
184+
scancode --config-file scancode-config.yaml samples samples.json
185+
186+
.. code-block:: yaml
187+
188+
ignored_patterns:
189+
- '*.java'
190+
- '*/licenses/*'
191+
192+
Here, ScanCode ignores files ending with `.java` and the `licenses` directory,
193+
and continues with other files as usual.
194+
195+
This is also compatible with the `scancode.io configuration file <https://scancodeio.readthedocs.io/en/latest/project-configuration.html#ignored-patterns>`_.
196+
197+
----
198+
199+
.. _glob-pattern-matching:
200+
201+
Glob Pattern Matching
202+
---------------------
203+
204+
All the pre-scan options use pattern matching, so the basics of Glob Pattern Matching is
205+
discussed briefly below.
206+
207+
Glob pattern matching is useful for matching a group of files, by using patterns in their
208+
names. Then using these patterns, files are grouped and treated differently as required.
209+
210+
Here are some rules from the `Linux Manual <http://man7.org/linux/man-pages/man7/glob.7.html>`_
211+
on glob patterns. Refer the same for more detailed information.
212+
213+
A string is a wildcard pattern if it contains one of the characters '?', '*' or '['. Globbing
214+
is the operation that expands a wildcard pattern into the list of pathnames matching the
215+
pattern. Matching is defined by:
216+
217+
- A '?' (not between brackets) matches any single character.
218+
219+
- A '*' (not between brackets) matches any string, including the empty string.
220+
221+
- An expression "[...]" where the first character after the leading '[' is not an '!' matches a
222+
single character, namely any of the characters enclosed by the brackets.
223+
224+
- There is one special convention: two characters separated by '-' denote a range.
225+
226+
- An expression "[!...]" matches a single character, namely any character that is not matched
227+
by the expression obtained by removing the first '!' from it.
228+
229+
- A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]".
230+
231+
Note that wildcard patterns are not regular expressions, although they are a bit similar.
232+
233+
For more information on glob pattern matching refer these resources:
234+
235+
- `Linux Manual <http://man7.org/linux/man-pages/man7/glob.7.html>`_
236+
- `Wildcard Match Documentation <https://facelessuser.github.io/wcmatch/glob/>`_.
237+
238+
You can also import these Python Libraries to practice UNIX style pattern matching:
239+
240+
- `fnmatch <https://docs.python.org/2/library/fnmatch.html>`_ for File Name matching
241+
- `glob <https://docs.python.org/2/library/glob.html#module-glob>`_ for File Path matching
242+

docs/source/reference/scancode-cli/cli-help-text-options.rst

Lines changed: 47 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,6 @@ The following help text is displayed for ScanCode version 32.0.0:
125125
such that all paths have a common root directory.
126126
127127
pre-scan:
128-
--ignore <pattern> Ignore files matching <pattern>.
129-
--include <pattern> Include files matching <pattern>.
130128
--classify Classify files with flags indicating whether the file is a
131129
legal, readme, test or similar file.
132130
--facet <facet>=<pattern> Add the <facet> to files with a path matching
@@ -169,11 +167,13 @@ The following help text is displayed for ScanCode version 32.0.0:
169167
at the file and directory level.
170168
171169
core:
170+
--ignore <pattern> Ignore files matching <pattern>.
172171
--timeout <seconds> Stop an unfinished file scan after a timeout in
173172
seconds. [default: 120 seconds]
174173
-n, --processes INT Set the number of parallel processes to use. Disable
175174
parallel processing if 0. Also disable threading if
176175
-1. [default: (number of CPUs)-1]
176+
-c, --config-file FILENAME Path to the configuration file.
177177
-q, --quiet Do not print summary or progress.
178178
-v, --verbose Print progress as file-by-file path instead of a
179179
progress bar. Print verbose scan counters.
@@ -512,7 +512,7 @@ for ScanCode Version 32.0.0.
512512
--------------------------------------------
513513
Plugin: scancode_post_scan:classify class: summarycode.classify_plugin:FileClassifier
514514
codebase_attributes:
515-
resource_attributes: is_legal, is_manifest, is_readme, is_top_level, is_key_file
515+
resource_attributes: is_legal, is_manifest, is_readme, is_top_level, is_key_file, is_community
516516
sort_order: 4
517517
required_plugins:
518518
options:
@@ -690,6 +690,19 @@ for ScanCode Version 32.0.0.
690690
- packages
691691
692692
693+
--------------------------------------------
694+
Plugin: scancode_post_scan:todo class: summarycode.todo:AmbiguousDetectionsToDoPlugin
695+
codebase_attributes: todo
696+
resource_attributes: for_todo
697+
sort_order: 3
698+
required_plugins:
699+
options:
700+
help_group: post-scan, name: todo: --todo
701+
help: Summarize scans by providing all ambiguous detections which are todo items and needs manual review.
702+
doc:
703+
Summarize a scan by compiling review items of ambiguous detections.
704+
705+
693706
--------------------------------------------
694707
Plugin: scancode_pre_scan:facet class: summarycode.facet:AddFacet
695708
codebase_attributes:
@@ -705,21 +718,6 @@ for ScanCode Version 32.0.0.
705718
test vs. data, etc.
706719
707720
708-
--------------------------------------------
709-
Plugin: scancode_pre_scan:ignore class: scancode.plugin_ignore:ProcessIgnore
710-
codebase_attributes:
711-
resource_attributes:
712-
sort_order: 100
713-
required_plugins:
714-
options:
715-
help_group: pre-scan, name: ignore: --ignore
716-
help: Ignore files matching <pattern>.
717-
help_group: pre-scan, name: include: --include
718-
help: Include files matching <pattern>.
719-
doc:
720-
Include or ignore files matching patterns.
721-
722-
723721
--------------------------------------------
724722
Plugin: scancode_scan:copyrights class: cluecode.plugin_copyright:CopyrightScanner
725723
codebase_attributes:
@@ -761,10 +759,23 @@ for ScanCode Version 32.0.0.
761759
Tag a file as generated.
762760
763761
762+
--------------------------------------------
763+
Plugin: scancode_scan:go_symbol class: go_inspector.plugin:GoSymbolScannerPlugin
764+
codebase_attributes:
765+
resource_attributes: go_symbols
766+
sort_order: 100
767+
required_plugins:
768+
options:
769+
help_group: primary scans, name: go_symbol: --go-symbol
770+
help: Collect Go symbols.
771+
doc:
772+
Scan a Go binary for symbols using GoReSym.
773+
774+
764775
--------------------------------------------
765776
Plugin: scancode_scan:info class: scancode.plugin_info:InfoScanner
766777
codebase_attributes:
767-
resource_attributes: date, sha1, md5, sha256, mime_type, file_type, programming_language, is_binary, is_text, is_archive, is_media, is_source, is_script
778+
resource_attributes: date, sha1, md5, sha256, sha1_git, mime_type, file_type, programming_language, is_binary, is_text, is_archive, is_media, is_source, is_script
768779
sort_order: 0
769780
required_plugins:
770781
options:
@@ -779,7 +790,7 @@ for ScanCode Version 32.0.0.
779790
Plugin: scancode_scan:licenses class: licensedcode.plugin_license:LicenseScanner
780791
codebase_attributes: license_detections
781792
resource_attributes: detected_license_expression, detected_license_expression_spdx, license_detections, license_clues, percentage_of_license_text
782-
sort_order: 4
793+
sort_order: 5
783794
required_plugins:
784795
options:
785796
help_group: primary scans, name: license: -l, --license
@@ -804,13 +815,15 @@ for ScanCode Version 32.0.0.
804815
Plugin: scancode_scan:packages class: packagedcode.plugin_package:PackageScanner
805816
codebase_attributes: packages, dependencies
806817
resource_attributes: package_data, for_packages
807-
sort_order: 3
818+
sort_order: 4
808819
required_plugins: scan:licenses
809820
options:
810821
help_group: primary scans, name: package: -p, --package
811822
help: Scan <input> for application package and dependency manifests, lockfiles and related data.
812823
help_group: primary scans, name: system_package: --system-package
813824
help: Scan <input> for installed system package databases.
825+
help_group: primary scans, name: package_in_compiled: --package-in-compiled
826+
help: Scan <input> for package and dependency related data in compiled binaries. Currently supported compiled binaries: Go, Rust.
814827
help_group: primary scans, name: package_only: --package-only
815828
help: Scan for system and application package data and skip license/copyright detection and top-level package creation.
816829
help_group: documentation, name: list_packages: --list-packages
@@ -821,6 +834,19 @@ for ScanCode Version 32.0.0.
821834
level.
822835
823836
837+
--------------------------------------------
838+
Plugin: scancode_scan:rust_symbol class: rust_inspector.plugin:RustSymbolScannerPlugin
839+
codebase_attributes:
840+
resource_attributes: rust_symbols
841+
sort_order: 100
842+
required_plugins:
843+
options:
844+
help_group: primary scans, name: rust_symbol: --rust-symbol
845+
help: Collect Rust symbols from rust binaries.
846+
doc:
847+
Scan a Rust binary for symbols using blint, lief and symbolic.
848+
849+
824850
--------------------------------------------
825851
Plugin: scancode_scan:urls class: cluecode.plugin_url:UrlScanner
826852
codebase_attributes:

docs/source/reference/scancode-cli/cli-post-scan-options.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,56 @@ To see all plugins available via command line help, use ``--plugins``.
1717

1818
----
1919

20+
.. _cli-classify-option:
21+
22+
``--classify``
23+
--------------
24+
25+
.. admonition:: Sub-option
26+
27+
The options ``--license-clarity-score`` and ``--tallies-key-files`` are sub-options of
28+
``--classify``. ``--license-clarity-score`` and ``--tallies-key-files`` are Post-Scan
29+
Options.
30+
31+
**Example**
32+
33+
.. code-block:: shell
34+
35+
scancode -clpieu --json-pp sample_facet.json samples --classify
36+
37+
This option makes ScanCode further classify scanned files/directories, to determine whether they
38+
fall in these following categories
39+
40+
- legal
41+
- readme
42+
- top-level
43+
- manifest
44+
45+
A manifest file in computing is a file containing metadata for a group of accompanying
46+
files that are part of a set or coherent unit.
47+
48+
- key-file
49+
50+
A KEY file serves as a keystone element, containing essential
51+
information about a software package — such as its dependencies,
52+
versioning, licensing, and more. It often contains the
53+
``primary-license`` or the overall license of the package, among
54+
other package metadata which are general or ecosystem specific.
55+
56+
As in, to the JSON object of each file scanned, these extra attributes are added.
57+
58+
.. code-block:: json
59+
60+
{
61+
"is_legal": false,
62+
"is_manifest": false,
63+
"is_readme": true,
64+
"is_top_level": true,
65+
"is_key_file": true
66+
}
67+
68+
----
69+
2070
.. _cli-mark-source-option:
2171

2272
``--mark-source``

0 commit comments

Comments
 (0)