|
TGREP(1) General Commands Manual TGREP(1)
NAME
tgrep - print text that matches patterns of lexical tokens
SYNOPSIS
tgrep [OPTION...]* -e PATTERN ... [FILE...]
DESCRIPTION
Tgrep searches for patterns in a set of one or more files, matching on
lexical tokens rather than characters (as in grep) or words (as in grep
-w). Patterns are insensitive to white-space, so matches can span mul-
tiple lines. If searching source code, comments are removed before pat-
terns are matched.
All tokens in the pattern must be separated from each other by white
space.
Traditional meta-characters from regular expression theory that match
common symbols and operators in C-like languages, (such as , ), |, and
+ ) are interpreted as regular characters unless precided by a back-
slash, as in \+. The Kleene star * is interpreted as a regular charac-
ter if preceded in the pattern by white space. If it is used as a suf-
fix of another token it is interpreted as a meta-character, as in to-
ken*, specifying an optional repetition of zero or more times of the
preceding token.
The patterns can use variable binding and constraints (see examples be-
low).
OPTIONS
-c Suppress normal output and only print a count of the number of
times that the pattern was matched. Note that there can be mul-
tiple matches on a single line of input.
-d Display long matches in terse format, showing only the first and
last two lines of each match.
-e pattern
Define the pattern expression to be matched. Tokens must be sep-
arated by spaces. A token category can be specified with an @
prefix. The list of token types is:
@chr - quoted character
@const - short for one of:
@const_flt
@const_hex
@const_int
@const_oct
@cpp - preprocessor directive
@ident - identifier
@key - keyword
@modifier - long, short, signed, unsigned
@oper - operator
@qualifier - const, volatile
@storage - static, extern, register, auto
@str - string
@type - int, char, float, double, void
The character pair .* can be used to specify a don't-care se-
quence of tokens. Variable binding can be used (definition, for
instance: x:@ident, with references later in the expression
written as :x). Location constraints can also be added (see the
pe manual page of the Cobra tool).
There can be multiple -e options.
-i Case insensitive search. Text fields in the input are mapped to
lower-case. The letters in the pattern itself are not changed,
so use only lower-case in the pattern itself to match text
fields.
-j Display matches in json format.
-l List only the names of files that contain matches of the pattern
(see also -c).
-Nn Use n cores (n>=1) to read the input files. By default n is
equal to the number of files, with a maximum of 8. (also ac-
cepted: -N n)
-n Display filenames and line number ranges for each matched pat-
tern (not compatible with -j and -t).
-r 'pattern'
Recursively find filenames matching regular expression 'pattern'
(e.g. '*.[ch]') in the current directory, or sub-directories to
use for input. Use single quotes around the pattern to avoid the
shell from expanding the pattern.
-t Display the matches, highlighting which tokens were matched on
each line.
-x Search comments instead of code.
languages:
-Ada -C++ -html -Java -Python -text
which can be abbreviated to: -A -C++ -H -J -P -T
The default input language assumed is ISO standard C.
NOTES
Tgrep is a small shell-script that runs the Cobra tool to perform its
function, in a style that is comparable to typical uses of the standard
grep tool.
Tgrep assumes Cobra Version 5.3 (January 2026) or later.
A full description of the notation for defining pattern expressions,
including the optional use of bound variables and constraints, can be
found in the online Cobra documentation. See: pe.html
EXAMPLES
A simple way to list include directives that import locally defined
files, in a set of source files, can be specified with token category
specifications, as follows.
$ tgrep -e '@cpp @str' *.[chyl]
The notation @cpp matches a compiler directive, and the notation @str
matches any string. If instead we want to list system include files,
the pattern becomes a little longer, as follows.
$ tgrep -e '@cpp < @ident . h >' *.[chyl]
If we want all system include files, but not those matching either
stdio.h or stdlib.h, we could write:
$ tgrep -e '@cpp < ^[stdio stdlib] . h >' *.[chyl]
To illustrate the use of variable binding and don't care sequences us-
ing the Kleene star, the following pattern matches a sequence of code
where the return value of fopen in a subsequent fprint statement with-
out checking the return value for errors.
$ tgrep -e 'x:@ident = fopen ( .* ) ^:x* fprintf ( :x' *.c
Note the use of spaces to separate individual tokens and symbols.
The first token matches any identifier, with the name bound to variable
x, using the prefix notation x:.
The next five token specifiers match token texts exactly, with .* used
as a short-hand for a don't care sequence of zero or more tokens.
After the closing round brace of calls to fopen (the braces are guaran-
teed to match at the right level of nesting) the pattern requires the
absence of uses of the bound variable (using the negation prefix ^ and
the bound variable reference shorthand Lx, followed by the suffix * to
indicate a repetition of zero or more).
The next token must match fprintf followed by an open round brace and
then a repeat of the bound variable x, again refered to as :x.
The next example shows a pattern that looks for a C function prototype
definition that is immediately followed by the function definition (a
form of redundancy). It uses two bound variables, named x and y here.
$ tgrep -d -e 'x:@type y:@ident ( .* ) \; :x :y ( .* ) {' *.c
Results are displayed in the default abbreviated form. We have to use
a backslash escape to protect the semi-colon from being interpreted as
a command separator. To also account for return values with pointers,
we can extend the pattern by including optional matches of zero or more
* symbols.
$ tgrep -e 'x:@type ** y:@ident ( .* ) \; :x ** :y ( .* ) {' *.c
The following example shows the use of a positional constraint:
$ tgrep -e ' x:\; .* :x <1> @1 (:x.lnr == .lnr)' *.c
This pattern matches two semi-colons, with arbitrary text in between
them, but the positional constraint placed at the repeat of the semi-
colon (with the first semi-colon bound to variable x) requiring that
the definition and reference appear on the same line. This can be used
to find uses of multiple statements appearing on the same line of
source text, which violates some coding standards. The pattern as
given will also match the control portion of for statements, but can be
refined to exclude those matches by extending the constraint to remove
matches inside round brace pairs:
$ tgrep -e ' x:\; .* :x <1> @1 (:x.lnr==.lnr && .round==0)' *.c
Search comments in Java code that contain regular expression FIXME, in-
senstive to capitalization.
$ tgrep -Java -x -n -e /[Ff][Ii][Xx][Mm][Ee] *.c
AUTHOR
Gerard Holzmann, gholzmann@acm.org
SEE ALSO
cobra(1), grep(1), awk(1)
https://codescrub.com
TGREP(1) General Commands Manual TGREP(1)
| |