Back to documentation
#!/usr/bin/env perl
package ygrok;
our $VERSION = '0.038';
# ABSTRACT: Build YAML by parsing lines of plain text

use ETL::Yertl;
use Pod::Usage::Return qw( pod2usage );
use Getopt::Long qw( GetOptionsFromArray :config pass_through );
use ETL::Yertl::Command::ygrok;

$|++; # no buffering

sub main {
    my ( $class, @argv ) = @_;
    my %opt;
    GetOptionsFromArray( \@argv, \%opt,
        'loose|l',
        'help|h',
        'version',
    );
    return pod2usage(0) if $opt{help};
    if ( $opt{version} ) {
        print "ygrok version $ygrok::VERSION (Perl $^V)\n";
        return 0;
    }

    eval {
        ETL::Yertl::Command::ygrok->main( @argv, \%opt );
    };
    if ( $@ ) {
        return pod2usage( "ERROR: $@" );
    }
    return 0;
}

exit __PACKAGE__->main( @ARGV ) unless caller(0);

__END__

=head1 SYNOPSIS

    ygrok [-l|--loose] <pattern> [<file>...]
    ygrok --pattern [<pattern_name> [<pattern>]]
    yfrom -h|--help|--version

=head1 DESCRIPTION

This program takes lines of plain text and converts them into documents.

=head1 ARGUMENTS

=head2 pattern

The pattern to match with. Any line that does not match the pattern will be ignored.

See the full documentation for pattern syntax.

=head2 file

A file to read. The special file "-" refers to STDIN. If no files are
specified, read STDIN.

=head1 OPTIONS

=head2 -l|--loose

Match anywhere in the line. Normally, the pattern must match the full line.
Setting this allows the pattern to match anywhere in the line (but still only once).

=head2 --pattern

View, add, and edit patterns. With no arguments, shows all the patterns. With
C<pattern_name>, shows the specific pattern or pattern category. With C<pattern>,
adds a custom pattern that can then be used in future patterns.

    # Show all patterns
    ygrok --pattern

    # Show all "NET" patterns
    ygrok --pattern NET

    # Show the "NET.HOSTNAME" pattern
    ygrok --pattern NET.HOSTNAME

    # Add a new pattern
    ygrok --pattern HOSTS_LINE '%{NET.HOSTNAME:host} %{NET.IPV4:ip}'

    # Use the new pattern
    ygrok '%{HOSTS_LINE}' < /etc/hosts

=head2 -h | --help

Show this help document.

=head2 --version

Print the current ygrok and Perl versions.

=head1 PATTERNS

A pattern is a match for the entire line, splitting the line into fields.

A named ygrok match has the format: C<%{PATTERN_NAME:field_name}>. The
C<PATTERN_NAME> is one of the available patterns, listed below. The
C<field_name> is the field to put the matched data.

Additionally, the pattern is a Perl regular expression, so any regular
expression syntax will work. Any named captures
(C<(?E<lt>field_nameE<gt>PATTERN)>) will be part of the document.

=head2 BUILT-IN PATTERNS

The built-in patterns are common patterns that are always available.

=over 4

=item Simple Patterns

=over 4

=item WORD

A single word, C<\b\w+\b>.

=item DATA

A non-slurpy section of data, C<.*?>.

=item INT

An integer, positive or negative.

=item NUM

A floating-point number, positive or negative, with optional exponent.

=back

=item Date/Time Patterns

=over 4

=item DATE.MONTH

A full or abbreviated month name for the "C" locale (January (Jan), February
(Feb), etc...)

=item DATE.ISO8601

An ISO8601 date/time

=item DATE.HTTP

An RFC822 date/time, used by HTTP.

=item DATE.SYSLOG

A syslog date, like "Jan 01 01:23:45"

=back

=item Operating System Patterns

=over 4

=item OS.USER

A username.

=item OS.PROCNAME

A process name

=back

=item Networking Patterns

=over 4

=item NET.IPV4

An IPv4 address.

=item NET.IPV6

An IPv6 address.

=item NET.HOSTNAME

A network host name. Either an RFC1101 domain or an IPv4 or IPv6 address.

=back

=item URL Patterns

=over 4

=item URL

A full URL with scheme

=item URL.PATH

The path part of a URL

=back

=item Log File Patterns

=over 4

=item LOG.HTTP_COMMON

The Apache Common Log Format.

=item LOG.HTTP_COMBINED

The Apache Combined Log Format.

=item LOG.SYSLOG

The syslog format (RFC 3164)

=back

=item POSIX Command Output Patterns

=over 4

=item POSIX.LS

Parse the output of C<ls -l>

=item POSIX.PS

Parse the output of C<ps>

=item POSIX.PSX

Parse the output of C<ps x> and C<ps w>

=item POSIX.PSU

Parse the output of C<ps u>

=back

=back

=head1 ENVIRONMENT VARIABLES

=over 4

=item YERTL_FORMAT

Specify the default format Yertl uses between commands. Defaults to C<yaml>. Can be
set to C<json> for interoperability with other programs.

=back