- Getting Started
- Simple Search
- Searching Multiple Paths
- Multiple Patterns
- Search Options
- Byte Offset (
--byte-offset
) - Column Number (
--column
) - Count Matching Lines (
-c/--count
) - Count Matches (
--count-matches
) - Fixed Strings (
--fixed-strings
) - Ignore Case (
-i/--ignore-case
) - Limit Output Line Length (
--max-columns
) - Print Only Matching Parts (
-o/--only-matching
) - Trim Whitespace (
--trim
) - Word Boundary (
-w/--word-regexp
)
- Byte Offset (
- Unicode
- Which Files?
- Git Repositories
- Usage
- Options
The program can be used in a few different ways:
- Single pattern, searching a single path (if no path is provided, the current directory is searched).
- Single pattern, searching multiple paths.
- Multiple patterns provided via
-e
option, searching multiple paths. - Multiple patterns provided via a pattern file, searching multiple paths.
- No patterns, just interested in what files will be searched (using
--files
)
A list of supported regex constructs can be found here.
The simplest of these is searching the current working directory for a single pattern. The following example searches the current directory for the literal pattern mmap
.
When piping hypergrep
output to another program, e.g., wc
or cat
, the output changes to a different format where each line represents a line of output.
To search multiple paths for a pattern match, simply provide the paths one after another, e.g.,
Multiple independent patterns can be provided in two ways:
- Using
-e/--regexp
and providing each pattern in the command line - Using
-f/--file
and providing a pattern file, which contains multiple patterns, one per line.
Use -e
to provide multiple patterns, one after another, in the same command
Consider the pattern file list_of_patterns.txt
with two lines:
hs_scan
fmt::print\("{}"
This file can be used to search multiple patterns at once using the -f/--file
option:
In addition to line numbers, the byte offset or the column number can be printed for each matching line.
Use -b/--byte-offset
to get the 0-based byte offset of the matching line in the file.
Use --column
to get the 1-based column number for the first-match in any matching line.
Use -c/--count
to count the number of matching lines in each file. Note that multiple matches per line still counts as 1 matching line.
Use --count-matches
to count the number of matches in each file. If there are multiple matches per line, these are individually counted.
Pure literal is a special case of regular expression. A character sequence is regarded as a pure literal if and only if each character is read and interpreted independently. No syntax association happens between any adjacent characters.
For example, given an expression written as /bc?/
. We could say it is a regular expression, with the meaning that character b
followed by nothing or by one character c
. On the other view, we could also say it is a pure literal expression, with the meaning that this is a character sequence of 3-byte
length, containing characters b
, c
and ?
. In regular case, the question mark character ?
has a particular syntax role called 0-1
quantifier, which has a syntax association with the character ahead of it. Similar characters exist in regular grammar like [
, ]
, (
, )
, {
, }
, -
, *
, +
, \
, |
, /
, :
, ^
, .
, $
. While in pure literal case, all these meta characters lost extra meanings expect for that they are just common ASCII codes.
Use -F/--fixed-strings
to specify that the regex pattern is a pure literal. Note in the following example that the special characters in the pattern are not escaped - they are considered as is.
hypergrep
search can be performed case-insensitively using the -i/--ignore-case
option.
Here's an example case-insensitive search for the literal test
:
Here's an example search for both the upper-case (Δ
) and lower-case (δ
) version of the greek letter delta.
If some of the matching lines are too long for you, you can hide them with --max-columns
and set the maximum line length for any matching line (in bytes). Lines longer than this limit will not be printed. Instead, a "Omitted line" message is printed along with the number of matches on each of these lines.
Sometimes, a user does not care about the entire line but only the matching parts. Here's an example, using -o/--only-matching
to only print the matching parts of the line, instead of the entire line.
This example searches for any cout
statement that ends in a std::endl
.
Use --trim
to trim whitespace (' '
, \t
) that prefixes any matching line.
In regex, simply adding \b
allows you to perform a “whole words only” search using a regular expression in the form of \bword\b
. A "word character" is a character that can be used to form words.
Use -w/--word-regexp
as a short-hand for this purpose. "Whole words only!"
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
NOTE \B
is the negated version of \b
. \B
matches at every position where \b
does not. Effectively, \B
matches at any position between two word characters as well as at any position between two non-word characters.
In the following example, any occurrence of test
that isn't surrounded by word characters will be matched. Note that in the final matching line, there are two occurrences of test
but only one matches.
hypergrep
regex engine is compiled with UTF8 support, i.e., patterns are treated as a sequence of UTF-8 characters.
Unicode character properties, such as \p{L}
, \P{Sc}
, \p{Greek}
etc., are supported.
Here's an example search for a range of emojis:
NOTE: You can specify the --ucp
flag use Unicode properties, rather than the default ASCII interpretations, for character mnemonics like \w
and \s
as well as the POSIX character classes.
Sometimes, it is necessary to check which files hypergrep
chooses to search in any directory. Use --files
to print a list of all files that hypergrep
will consider.
Note in the above example that hidden files and directories are ignored by default.
If you only care about which files have the matches, and not necessarily what the matches are, use -l/--files-with-matches
to get a list of all the files with matches.
Use --filter
to filter the files being searched. Only files that positively match the filter pattern will be searched.
NOTE that this is not a glob pattern but a PCRE pattern.
The following pattern, googletest/(include|src)/.*\.(cpp|hpp|c|h)$
, matches any C/C++ source file in any googletest/include
and googletest/src
subdirectory.
Running in the /usr
directory and searching for any shared library, here's the performance:
Command | Number of Files | Time |
---|---|---|
find . -name "*.so" | wc -l |
1851 | 0.293 |
rg -g "*.so" --files | wc -l |
1621 | 0.082 |
hgrep --filter '\.so$' --files | wc -l |
1621 | 0.043 |
This sort of filtering can be negated by prefixing the filter with the !
character, e.g.,: the pattern !\.(cpp|hpp)$
will match any file that is NOT a C++ source file.
Hidden Files
By default, hidden files and directories are skipped. A file or directory is considered hidden if its base name starts with a dot character ('.'
).
You can include hidden files and directories in the search using the --hidden
option.
If you want to filter out files over a certain size, you can use --max-filesize
to provide a file size specification. The input accepts suffixes of form K
, M
or G
.
If no suffix is provided the input is treated as bytes e.g., the following search filters out any files over 30 bytes in size.
hypergrep
treats git repositories, i.e., directories with a .git/
subdirectory, differently to other ordinary directories. When hypergrep
encounters a git repository, instead of traversing the directory tree, the program reads the git index file of the repository (at .git/index
) and iterates the index entries using libgit2.
NOTE in the following example:
ls
command shows all the files and directories in the current path- Note the
build/
folder
- Note the
git ls-files
shows all the files in the git index and the working treehgrep --files
output is very similar togit ls-files
except that hidden files are ignored.
hypergrep
prefers this approach of iterating the git index rather than loading the .gitignore
file and checking every single file and subdirectory against a potentially long list of ignore rules.
NOTE By default, hypergrep
will recursively search any git submodules that are found. This can be excluded using --ignore-submodules
.
NOTE If you don't like that hypergrep
treats git repositories differently, and you'd rather it search the directory as an ordinary directory, use --ignore-gitindex
and override this behavior.
hgrep [OPTIONS] PATTERN [PATH ...]
hgrep [OPTIONS] -e PATTERN ... [PATH ...]
hgrep [OPTIONS] -f PATTERNFILE ... [PATH ...]
hgrep [OPTIONS] --files [PATH ...]
hgrep [OPTIONS] --help
hgrep [OPTIONS] --version
Name | Description |
---|---|
-b, --byte-offset |
Print the 0-based byte offset within the input file before each line of output. If -o (--only-matching ) is used, print the offset of the matching part itself. |
--column |
Show column numbers (1-based). This only shows the column numbers for the first match on each line. |
-c, --count |
This flag suppresses normal output and shows the number of lines that match the given pattern for each file searched |
--count-matches |
This flag suppresses normal output and shows the number of individual matches of the given pattern for each file searched |
-e, --regexp <PATTERN>... |
A pattern to search for. This option can be provided multiple times, where all patterns given are searched. Lines matching at least one of the provided patterns are printed, e.g.,hgrep -e 'myFunctionCall' -e 'myErrorCallback' will search for any occurrence of either of the patterns. |
-f, --files <PATTERNFILE>... |
Search for patterns from the given file, with one pattern per line. When this flag is used multiple times or in combination with the -e/---regexp flag, then all patterns provided are searched. |
--files |
Print each file that would be searched without actually performing the search |
--filter <FILTERPATTERN> |
Filter paths based on a regex pattern, e.g.,hgrep --filter '(include|src)/.*\.(c|cpp|h|hpp)$' will search C/C++ files in the any */include/* and */src/* paths.A filter can be negated by prefixing the pattern with !, e.g., hgrep --filter '!\.html$' will search any files that are not HTML files. |
-F, --fixed-strings |
Treat the pattern as a literal string instead of a regex. Special regex meta characters such as .(){}*+ do not need to be escaped. |
-h, --help |
Display help message. |
--hidden |
Search hidden files and directories. By default, hidden files and directories are skipped. A file or directory is considered hidden if its base name starts with a dot character ('.' ). |
-i, --ignore-case |
When this flag is provided, the given patterns will be searched case insensitively. The may still use PCRE tokens (notably (?i) and (?-i) ) to toggle case-insensitive matching. |
--ignore-gitindex |
By default, hypergrep will check for the presence of a .git/ directory in any path being searched. If a .git/ directory is found, hypergrep will attempt to find and load the git index file. Once loaded, the git index entries will be iterated and searched. Using --ignore-gitindex will disable this behavior. Instead, hypergrep will search this path as if it were a normal directory. |
--ignore-submodules |
For any detected git repository, this option will cause hypergrep to exclude any submodules found. |
--include-zero |
When used with --count or --count-matches , print the number of matches for each file even if there were zero matches. This is distabled by default. |
-I, --no-filename |
Never print the file path with the matched lines. This is the default when searching one file or stdin. |
-l, --files-with-matches |
Print the paths with at least one match and suppress match contents. |
-M, --max-columns <NUM> |
Don't print lines longer than this limit in bytes. Longer lines are omitted, and only the number of matches in that line is printed. |
--max-filesize <NUM+SUFFIX?> |
Ignore files above a certain size. The input accepts suffixes of form K , M or G . If no suffix is provided the input is treated as bytes e.g.,hgrep --max-filesize 50K will search any files under 50KB in size. |
-n, --line-number |
Show line numbers (1-based). This is enabled by defauled when searching in a terminal. |
-N, --no-line-number |
Suppress line numbers. This is enabled by default when not searching in a terminal. |
-o, --only-matching |
Print only matched parts of a matching line, with each such part on a separate output line. |
--ucp |
Use unicode properties, rather than the default ASCII interpretations, for character mnemonics like \w and \s as well as the POSIX character classes. |
-v, --version |
Display the version information. |
-w, --word-regexp |
Only show matches surrounded by word boundaries. This is equivalent to putting \b before and after the the search pattern. |