I use ETL::Yertl a lot. Despite its
present unpolished state, it contains some important, easy-to-use tools that I
need to get my work done. For example, this week I got an e-mail from Slaven (a
CPAN tester and a tireless reporter of CPAN issues found by testing) saying
that some records were missing from one the APIs on CPAN
Testers: The
fast-matrix had 3300 records for the
"forks" distribution version 0.36, but the
matrix had only 300 records. The utilities in
ETL::Yertl made it easy to find and
manipulate the data I needed to diagnose this problem.
Continue reading Everyday ETL With Yertl...
A time series database is a massively useful tool for system reporting
and monitoring. By storing series of simple values attached to
timestamps, an ops team can see how fast their application is
processing data, how much traffic they're serving, and how many
resources they're consuming. From this data they can determine how well
their application is working, track down issues in the system, and plan
for future resource needs.
There have been a lot of new databases and tools developed to create,
store, and consume time series data, and existing databases are being
enhanced to better support time series data.
With the new release of ETL::Yertl, we can
easily translate SQL database queries into metrics for monitoring and
reporting. I've been using these new features to monitor the CPAN
Testers application.
Continue reading Application Metrics with Yertl...
As a data warehouse, a significant part of my job involves log analysis.
Besides the standard root cause analysis, I need to verify database
writes, diagnose user access issues, and look for under-used (and
over-used) data sets. Additionally, my boss needs quarterly and yearly
reports for client billing, and some of our clients need usage reports
to identify data they might be paying for but not using (which we can
then shut off to reduce costs). This has recently become a popular space
for new solutions.
On the other side, as a sysadmin, I need to get other reports like how
all the machine's resources (CPU, memory, disk, network) are being used,
what processes are running on the machine and how those processes used
resources over time. This is basic monitoring, and there are lots of
solutions here, too. In the true Unix philosophy, there are command-line
programs to query every one of these, which write out text that I can
then parse.
In my previous post about
ysql, I showed how
to use the ysql
utility to read/write YAML documents to SQL databases.
Now, Yertl has a ygrok
utility to parse plain text into YAML documents.
Continue reading ygrok - Parse plain text into data structures...
Originally posted on blogs.perl.org -- Managing SQL Data with
Yertl
Every week, I work with about a dozen SQL databases. Some are Sybase, some
MySQL, some SQLite. Some have different versions in dev, staging, and
production. All of them need data extracted, transformed, and loaded.
DBI is the clear choice for dealing with SQL databases in Perl, but there are a
dozen lines of Perl code in between me and the operation that I want. Sure,
I've got modules and web applications and ad-hoc commands and scripts that
perform certain individual tasks on my databases, but sometimes those things
don't quite do what I need right now, and I just want something that will let
me execute whatever SQL I can come up with.
Yertl (ETL::Yertl) is a shell-based ETL
framework. It's under development (as is all software), but included already is
a small utility called ysql to make dealing
with SQL databases easy.
Continue reading Managing SQL Data with Yertl...