ygrok - Parse plain text into data structures


As a data warehouse, a significant part of my job involves log analysis. Besides the standard root cause analysis, I need to verify database writes, diagnose user access issues, and look for under-used (and over-used) data sets. Additionally, my boss needs quarterly and yearly reports for client billing, and some of our clients need usage reports to identify data they might be paying for but not using (which we can then shut off to reduce costs). This has recently become a popular space for new solutions.

On the other side, as a sysadmin, I need to get other reports like how all the machine's resources (CPU, memory, disk, network) are being used, what processes are running on the machine and how those processes used resources over time. This is basic monitoring, and there are lots of solutions here, too. In the true Unix philosophy, there are command-line programs to query every one of these, which write out text that I can then parse.

In my previous post about ysql, I showed how to use the ysql utility to read/write YAML documents to SQL databases. Now, Yertl has a ygrok utility to parse plain text into YAML documents.

Continue reading ygrok - Parse plain text into data structures...

List::Slice - Slice operations for lists


How many times have you needed to do this?

my @found_names = grep { /^[A-D]/ } @all_names;
my @topfive = @found_names[0..4];

Or worse, this.

my @topfive = ( grep { /^[A-D]/ } @all_names )[0..4];

There's got to be a better way

Or this.

my @bottomfive = @names < 5 ? @names : @names[$#names-5..$#names];

Or this.

my @names
        = map { $_->[0] }
        sort { $a->[1] <=> $b->[1] }
        grep { $_->[1] > $now }
        map { [ $_->{name}, parse_date( $_->{birthday} ) ] }
my @topfive = @names[0..4];

There's got to be a better way!

There's got to be a better way

Now there is! Introducing: List::Slice!

Continue reading List::Slice - Slice operations for lists...

Announcing Statocles


Static site generators are popular these days. For small sites, the ability to quickly author content using simple tools is key. The ability to use lower-cost (even free) hosting, often without any dynamic capabilities, is good for trying to maintain a budget. For larger sites, the ability to serve content quickly and cheaply is beneficial, and since most pages are read far more often than they are written, generating a full web page to store on the filesystem can improve performance (and lower costs).

For me, I like the convenience of using Github Pages to host project-oriented websites. The project itself is already on Github, so why not keep the website closely tied to it so it doesn't get out-of-date? For an organization like the Chicago Perl Mongers, Github can even host custom domains, allowing easy collaboration on websites.

It's through the Chicago.PM website that I was introduced to Octopress, a blogging engine built on Jekyll. It's through using Octopress that I decided to write my own static site generator, Statocles.

Continue reading Announcing Statocles...

Managing SQL Data with Yertl


Originally posted on blogs.perl.org -- Managing SQL Data with Yertl

Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.

DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.

Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.

Continue reading Managing SQL Data with Yertl...