CodeScrub

TGREP(1) General Commands Manual TGREP(1) NAME tgrep - print text that matches patterns of lexical tokens SYNOPSIS tgrep [OPTION...]* -e PATTERN ... [FILE...] DESCRIPTION Tgrep searches for patterns in a set of one or more files, matching on lexical tokens rather than characters (as in grep) or words (as in grep -w). Patterns are insensitive to white-space, so matches can span mul- tiple lines. If searching source code, comments are removed before pat- terns are matched. All tokens in the pattern must be separated from each other by white space. Traditional meta-characters from regular expression theory that match common symbols and operators in C-like languages, (such as , ), |, and + ) are interpreted as regular characters unless precided by a back- slash, as in \+. The Kleene star * is interpreted as a regular charac- ter if preceded in the pattern by white space. If it is used as a suf- fix of another token it is interpreted as a meta-character, as in to- ken*, specifying an optional repetition of zero or more times of the preceding token. The patterns can use variable binding and constraints (see examples be- low). OPTIONS -c Suppress normal output and only print a count of the number of times that the pattern was matched. Note that there can be mul- tiple matches on a single line of input. -d Display long matches in terse format, showing only the first and last two lines of each match. -e pattern Define the pattern expression to be matched. Tokens must be sep- arated by spaces. A token category can be specified with an @ prefix. The list of token types is: @chr - quoted character @const - short for one of: @const_flt @const_hex @const_int @const_oct @cpp - preprocessor directive @ident - identifier @key - keyword @modifier - long, short, signed, unsigned @oper - operator @qualifier - const, volatile @storage - static, extern, register, auto @str - string @type - int, char, float, double, void The character pair .* can be used to specify a don't-care se- quence of tokens. Variable binding can be used (definition, for instance: x:@ident, with references later in the expression written as :x). Location constraints can also be added (see the pe manual page of the Cobra tool). There can be multiple -e options. -i Case insensitive search. Text fields in the input are mapped to lower-case. The letters in the pattern itself are not changed, so use only lower-case in the pattern itself to match text fields. -j Display matches in json format. -l List only the names of files that contain matches of the pattern (see also -c). -Nn Use n cores (n>=1) to read the input files. By default n is equal to the number of files, with a maximum of 8. (also ac- cepted: -N n) -n Display filenames and line number ranges for each matched pat- tern (not compatible with -j and -t). -r 'pattern' Recursively find filenames matching regular expression 'pattern' (e.g. '*.[ch]') in the current directory, or sub-directories to use for input. Use single quotes around the pattern to avoid the shell from expanding the pattern. -t Display the matches, highlighting which tokens were matched on each line. -x Search comments instead of code. languages: -Ada -C++ -html -Java -Python -text which can be abbreviated to: -A -C++ -H -J -P -T The default input language assumed is ISO standard C. NOTES Tgrep is a small shell-script that runs the Cobra tool to perform its function, in a style that is comparable to typical uses of the standard grep tool. Tgrep assumes Cobra Version 5.3 (January 2026) or later. A full description of the notation for defining pattern expressions, including the optional use of bound variables and constraints, can be found in the online Cobra documentation. See: pe.html EXAMPLES A simple way to list include directives that import locally defined files, in a set of source files, can be specified with token category specifications, as follows. $ tgrep -e '@cpp @str' *.[chyl] The notation @cpp matches a compiler directive, and the notation @str matches any string. If instead we want to list system include files, the pattern becomes a little longer, as follows. $ tgrep -e '@cpp < @ident . h >' *.[chyl] If we want all system include files, but not those matching either stdio.h or stdlib.h, we could write: $ tgrep -e '@cpp < ^[stdio stdlib] . h >' *.[chyl] To illustrate the use of variable binding and don't care sequences us- ing the Kleene star, the following pattern matches a sequence of code where the return value of fopen in a subsequent fprint statement with- out checking the return value for errors. $ tgrep -e 'x:@ident = fopen ( .* ) ^:x* fprintf ( :x' *.c Note the use of spaces to separate individual tokens and symbols. The first token matches any identifier, with the name bound to variable x, using the prefix notation x:. The next five token specifiers match token texts exactly, with .* used as a short-hand for a don't care sequence of zero or more tokens. After the closing round brace of calls to fopen (the braces are guaran- teed to match at the right level of nesting) the pattern requires the absence of uses of the bound variable (using the negation prefix ^ and the bound variable reference shorthand Lx, followed by the suffix * to indicate a repetition of zero or more). The next token must match fprintf followed by an open round brace and then a repeat of the bound variable x, again refered to as :x. The next example shows a pattern that looks for a C function prototype definition that is immediately followed by the function definition (a form of redundancy). It uses two bound variables, named x and y here. $ tgrep -d -e 'x:@type y:@ident ( .* ) \; :x :y ( .* ) {' *.c Results are displayed in the default abbreviated form. We have to use a backslash escape to protect the semi-colon from being interpreted as a command separator. To also account for return values with pointers, we can extend the pattern by including optional matches of zero or more * symbols. $ tgrep -e 'x:@type ** y:@ident ( .* ) \; :x ** :y ( .* ) {' *.c The following example shows the use of a positional constraint: $ tgrep -e ' x:\; .* :x <1> @1 (:x.lnr == .lnr)' *.c This pattern matches two semi-colons, with arbitrary text in between them, but the positional constraint placed at the repeat of the semi- colon (with the first semi-colon bound to variable x) requiring that the definition and reference appear on the same line. This can be used to find uses of multiple statements appearing on the same line of source text, which violates some coding standards. The pattern as given will also match the control portion of for statements, but can be refined to exclude those matches by extending the constraint to remove matches inside round brace pairs: $ tgrep -e ' x:\; .* :x <1> @1 (:x.lnr==.lnr && .round==0)' *.c Search comments in Java code that contain regular expression FIXME, in- senstive to capitalization. $ tgrep -Java -x -n -e /[Ff][Ii][Xx][Mm][Ee] *.c AUTHOR Gerard Holzmann, gholzmann@acm.org SEE ALSO cobra(1), grep(1), awk(1) https://codescrub.com TGREP(1) General Commands Manual TGREP(1)