Greple

greple: grep type command with multiple keywords

This project is maintained by kaz-utashiro

NAME

greple - extensible grep with lexical expression and region control

VERSION

Version 8.57

SYNOPSIS

greple [-Mmodule] [ -options ] pattern [ file… ]

PATTERN
  pattern              'and +must -not ?alternative &function'
  -x, --le   pattern   lexical expression (same as bare pattern)
  -e, --and  pattern   pattern match across line boundary
  -r, --must pattern   pattern cannot be compromised
  -v, --not  pattern   pattern not to be matched
      --or   pattern   alternative pattern group
      --re   pattern   regular expression
      --fe   pattern   fixed expression
  -f, --file file      file contains search pattern
  --select index       select indexed pattern from -f file
MATCH
  -i                   ignore case
  --need=[+-]n         required positive match count
  --allow=[+-]n        acceptable negative match count
  --matchcount=n[,m]   required match count for each block
STYLE
  -l                   list filename only
  -c                   print count of matched block only
  -n                   print line number
  -H, -h               do or do not display filenames
  -o                   print only the matching part
  --all                print entire data
  -m, --max=n[,m]      max count of blocks to be shown
  -A,-B,-C [n]         after/before/both match context
  --join               delete newline in the matched part
  --joinby=string      replace newline in the matched text by string
  --nonewline          do not add newline character at block end
  --filestyle=style    how filename printed (once, separate, line)
  --linestyle=style    how line number printed (separate, line)
  --separate           set filestyle and linestyle both "separate"
  --format LABEL=...   define line number and file name format
FILE
  --glob=glob          glob target files
  --chdir=dir          change directory before search
  --readlist           get filenames from stdin
COLOR
  --color=when         use terminal color (auto, always, never)
  --nocolor            same as --color=never
  --colormap=color     R, G, B, C, M, Y etc.
  --colorful           use default multiple colors
  --colorindex=flags   color index method: Ascend/Descend/Block/Random
  --ansicolor=s        ANSI color 16, 256 or 24bit
  --[no]256            same as --ansicolor 256 or 16
  --regioncolor        use different color for inside/outside regions
  --uniqcolor          use different color for unique string
  --uniqsub=func       preprocess function before check uniqueness
  --random             use random color each time
  --face               set/unset visual effects
BLOCK
  -p, --paragraph      paragraph mode
  --border=pattern     border pattern
  --block=pattern      block of records
  --blockend=s         block end mark (Default: "--")
  --join-blocks        join back-to-back consecutive blocks
REGION
  --inside=pattern     select matches inside of pattern
  --outside=pattern    select matches outside of pattern
  --include=pattern    reduce matches to the area
  --exclude=pattern    reduce matches to outside of the area
  --strict             strict mode for --inside/outside --block
CHARACTER CODE
  --icode=name         file encoding
  --ocode=name         output encoding
FILTER
  --if,--of=filter     input/output filter command
  --pf=filter          post process filter command
  --noif               disable default input filter
RUNTIME FUNCTION
  --print=func         print function
  --continue           continue after print function
  --callback=func      callback function for matched string
  --begin=func         call function before search
  --end=func           call function after search
  --prologue=func      call function before command execution
  --epilogue=func      call function after command execution
OTHER
  --usage[=expand]     show this message
  --exit=n             command exit status
  --norc               skip reading startup file
  --man                display command or module manual page
  --show               display module file
  --require=file       include perl program
  --persist            same as --error=retry
  --error=action       action after read error
  --warn=type          run time error control
  --alert [name=#]     set alert parameter (size/time)
  -d flags             display info (f:file d:dir c:color m:misc s:stat)

INSTALL

CPANMINUS

$ cpanm App::Greple
or
$ curl -sL http://cpanmin.us | perl - App::Greple

DESCRIPTION

MULTIPLE KEYWORDS

greple has almost same function as Unix command egrep(1) but search is done in a manner similar to Internet search engine. For example, next command print lines those contain all of `foo’ and bar’ and `baz’.

greple 'foo bar baz' ...

Each word can appear in any order and any place in the string. So this command find all of following lines.

foo bar baz
baz bar foo
the foo, bar and baz

If you want to use OR syntax, prepend question mark (`?’) on each token, or use regular expression.

greple 'foo bar baz ?yabba ?dabba ?doo'
greple 'foo bar baz yabba|dabba|doo'

This command will print lines those contains all of `foo’, `bar’ and `baz’ and one or more of `yabba’, `dabba’ or `doo’.

NOT operator can be specified by prefixing the token by minus sign (`-‘). Next example will show lines those contain both `foo’ and bar’ but none of `yabba’, `dabba’ or `doo’.

greple 'foo bar -yabba -dabba -doo'

This can be written as this using -e and -v option.

greple -e foo -e bar -v yabba -v dabba -v doo
greple -e foo -e bar -v 'yabba|dabba|doo'

If `+’ is placed to positive matching pattern, that pattern is marked as required, and required match count is automatically set to the number of required patterns. So

greple '+foo bar baz'

commands implicitly set the option --need 1, and consequently print all lines including `foo’. In other words, it makes other patterns optional, but they are highlighted if exist. If you want to search lines which includes `foo’ and either or both of `bar’ and `baz’, use like this:

greple '+foo bar baz' --need 2
greple '+foo bar baz' --need +1
greple 'foo bar|baz'

FLEXIBLE BLOCKS

Default data block greple search and print is a line. Using –paragraph (or -p in short) option, series of text separated by empty line is taken as a record block. So next command will print whole paragraph which contains the word `foo’, `bar’ and `baz’.

greple -p 'foo bar baz'

Block also can be defined by pattern. Next command treat the data as a series of 10-line unit.

greple -n --border='(.*\n){1,10}'

You can also define arbitrary complex blocks by writing script.

greple --block '&your_original_function' ...

MATCH AREA CONTROL

Using option –inside and –outside, you can specify the text area to be matched. Next commands search only in mail header and body area respectively. In these cases, data block is not changed, so print lines which contains the pattern in the specified area.

greple --inside '\A(.+\n)+' pattern

greple --outside '\A(.+\n)+' pattern

Option –inside/–outside can be used repeatedly to enhance the area to be matched. There are similar option –include/–exclude, but they are used to trim down the area.

These four options also take user defined function and any complex region can be used.

LINE ACROSS MATCH

greple search a given pattern across line boundaries. This is especially useful to handle Asian multi-byte text, more specifically Japanese. Japanese text can be separated by newline almost any place in the text. So the search pattern may spread out onto multiple lines.

As for ascii word list, space character in the pattern matches any kind of space including newline. Next example will search the word sequence of `foo’, `bar’ and ‘baz’, even they spread out to multiple lines.

greple -e 'foo bar baz'

Option -e is necessary because space is taken as a token separator in the bare or –le pattern.

MODULE AND CUSTOMIZATION

User can define default and original options in ~/.greplerc. Next example enables colored output always, and define new option using macro processing.

option default --color=always

define :re1 complex-regex-1
define :re2 complex-regex-2
define :re3 complex-regex-3
option --newopt --inside :re1 --exclude :re2 --re :re3

Specific set of function and option interface can be implemented as module. Modules are invoked by -M option immediately after command name.

For example, greple does not have recursive search option, but it can be implemented by –readlist option which accept target file list from standard input. Using find module, it can be written like this:

greple -Mfind . -type f -- pattern

Also dig module implements more complex search. It can be used as simple as this:

greple -Mdig pattern --dig .

but this command is finally translated into following option list.

greple -Mfind . ( -name .git -o -name .svn -o -name RCS ) -prune -o
    -type f ! -name .* ! -name *,v ! -name *~
    ! -iname *.jpg ! -iname *.jpeg ! -iname *.gif ! -iname *.png
    ! -iname *.tar ! -iname *.tbz  ! -iname *.tgz ! -iname *.pdf
    -print -- pattern

INCLUDED MODUES

This release include some sample modules. Read document in each modules for detail. You can read the document by –man option.

greple -Mdig --man

When it does not work, use perldoc App::Greple::dig.

Other modules are available at CPAN, or git repository https://github.com/kaz-utashiro/.

OPTIONS

PATTERNS

If no specific option is given, greple takes the first argument as a search pattern specified by –le option. All of these patterns can be specified multiple times.

Command itself is written in Perl, and any kind of Perl style regular expression can be used in patterns. See perlre(1) for detail.

Note that multiple line modifier (m) is set when executed, so put (?-m) at the beginning of regex if you want to explicitly disable it.

Order of capture group in the pattern is not guaranteed. Please avoid to use direct index, and use relative or named capture group instead. For example, if you want to search repeated characters, use (\w)\g{-1} or (?<c>\w)\g{c} rather than (\w)\1.

STYLES

FILES

COLORS

BLOCKS

REGIONS

CHARACTER CODE

FILTER

RUNTIME FUNCTIONS

For these run-time functions, optional argument list can be set in the form of key or key=value, connected by comma. These arguments will be passed to the function in key => value list. Sole key will have the value one. Also processing file name is passed with the key of FILELABEL constant. As a result, the option in the next form:

--begin function(key1,key2=val2)
--begin function=key1,key2=val2

will be transformed into following function call:

function(&FILELABEL => "filename", key1 => 1, key2 => "val2")

As described earlier, FILELABEL parameter is not given to the function specified with module option. So

-Mmodule::function(key1,key2=val2)
-Mmodule::function=key1,key2=val2

simply becomes:

function(key1 => 1, key2 => "val2")

The function can be defined in .greplerc or modules. Assign the arguments into hash, then you can access argument list as member of the hash. It’s safe to delete FILELABEL key if you expect random parameter is given. Content of the target file can be accessed by $_. Ampersand (&) is required to avoid the hash key is interpreted as a bare word.

sub function {
    my %arg = @_;
    my $filename = delete $arg{&FILELABEL};
    $arg{key1};             # 1
    $arg{key2};             # "val2"
    $_;                     # contents
}

OTHERS

ENVIRONMENT and STARTUP FILE

Before starting execution, greple reads the file named .greplerc on user’s home directory. Following directives can be used.

Environment variable substitution is done for string specified by `option’ and `define’ directives. Use Perl syntax $ENV{NAME} for this purpose. You can use this to make a portable module.

When greple found __PERL__ line in .greplerc file, the rest of the file is evaluated as a Perl program. You can define your own subroutines which can be used by –inside/outside, –include/exclude, –block options.

For those subroutines, file content will be provided by global variable $_. Expected response from the subroutine is the list of array references, which is made up by start and end offset pairs.

For example, suppose that the following function is defined in your .greplerc file. Start and end offset for each pattern match can be taken as array element $-[0] and $+[0].

__PERL__
sub odd_line {
    my @list;
    my $i;
    while (/.*\n/g) {
        push(@list, [ $-[0], $+[0] ]) if ++$i % 2;
    }
    @list;
}

You can use next command to search pattern included in odd number lines.

% greple --inside '&odd_line' pattern files...

MODULE

You can expand the greple command using module. Module files are placed at App/Greple/ directory in Perl library, and therefor has App::Greple::module package name.

In the command line, module have to be specified preceding any other options in the form of -Mmodule. However, it also can be specified at the beginning of option expansion.

If the package name is declared properly, __DATA__ section in the module file will be interpreted same as .greplerc file content. So you can declare the module specific options there. Functions declared in the module can be used from those options, it makes highly expandable option/programming interaction possible.

Using -M without module argument will print available module list. Option –man will display module document when used with -M option. Use –show option to see the module itself. Option –path will print the path of module file.

See this sample module code. This sample defines options to search from pod, comment and other segment in Perl script. Those capability can be implemented both in function and macro.

package App::Greple::perl;

use Exporter 'import';
our @EXPORT      = qw(pod comment podcomment);
our %EXPORT_TAGS = ( );
our @EXPORT_OK   = qw();

use App::Greple::Common;
use App::Greple::Regions;

my $pod_re = qr{^=\w+(?s:.*?)(?:\Z|^=cut\s*\n)}m;
my $comment_re = qr{^(?:[ \t]*#.*\n)+}m;

sub pod {
    match_regions(pattern => $pod_re);
}
sub comment {
    match_regions(pattern => $comment_re);
}
sub podcomment {
    match_regions(pattern => qr/$pod_re|$comment_re/);
}

1;

__DATA__

define :comment: ^(\s*#.*\n)+
define :pod: ^=(?s:.*?)(?:\Z|^=cut\s*\n)

#option --pod --inside :pod:
#option --comment --inside :comment:
#option --code --outside :pod:|:comment:

option --pod --inside '&pod'
option --comment --inside '&comment'
option --code --outside '&podcomment'

You can use the module like this:

greple -Mperl --pod default greple

greple -Mperl --colorful --code --comment --pod default greple

If special subroutine initialize() and finalize() are defined in the module, they are called at the beginning with Getopt::EX::Module object as a first argument. Second argument is the reference to @ARGV, and you can modify actual @ARGV using it. See App::Greple::find module as an example.

Calling sequence is like this. See Getopt::EX::Module for detail.

1) Call initialize()
2) Call function given in -Mmod::func() style
3) Call finalize()

HISTORY

Most capability of greple is derived from mg command, which has been developing from early 1990’s by the same author. Because modern standard grep family command becomes to have similar capabilities, it is a time to clean up entire functionalities, totally remodel the option interfaces, and change the command name. (2013.11)

SEE ALSO

grep(1), perl(1)

App::Greple, https://github.com/kaz-utashiro/greple

Getopt::EX, https://github.com/kaz-utashiro/Getopt-EX

AUTHOR

Kazumasa Utashiro

LICENSE

Copyright 1991-2022 Kazumasa Utashiro

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.