Everyday ETL With Yertl

Tags:

I use ETL::Yertl a lot. Despite its present unpolished state, it contains some important, easy-to-use tools that I need to get my work done. For example, this week I got an e-mail from Slaven (a CPAN tester and a tireless reporter of CPAN issues found by testing) saying that some records were missing from one the APIs on CPAN Testers: The fast-matrix had 3300 records for the "forks" distribution version 0.36, but the matrix had only 300 records. The utilities in ETL::Yertl made it easy to find and manipulate the data I needed to diagnose this problem.

Continue reading Everyday ETL With Yertl...

Application Metrics with Yertl

Tags:

A time series database is a massively useful tool for system reporting and monitoring. By storing series of simple values attached to timestamps, an ops team can see how fast their application is processing data, how much traffic they're serving, and how many resources they're consuming. From this data they can determine how well their application is working, track down issues in the system, and plan for future resource needs.

There have been a lot of new databases and tools developed to create, store, and consume time series data, and existing databases are being enhanced to better support time series data.

With the new release of ETL::Yertl, we can easily translate SQL database queries into metrics for monitoring and reporting. I've been using these new features to monitor the CPAN Testers application.

Continue reading Application Metrics with Yertl...

ygrok - Parse plain text into data structures

Tags:

As a data warehouse, a significant part of my job involves log analysis. Besides the standard root cause analysis, I need to verify database writes, diagnose user access issues, and look for under-used (and over-used) data sets. Additionally, my boss needs quarterly and yearly reports for client billing, and some of our clients need usage reports to identify data they might be paying for but not using (which we can then shut off to reduce costs). This has recently become a popular space for new solutions.

On the other side, as a sysadmin, I need to get other reports like how all the machine's resources (CPU, memory, disk, network) are being used, what processes are running on the machine and how those processes used resources over time. This is basic monitoring, and there are lots of solutions here, too. In the true Unix philosophy, there are command-line programs to query every one of these, which write out text that I can then parse.

In my previous post about ysql, I showed how to use the ysql utility to read/write YAML documents to SQL databases. Now, Yertl has a ygrok utility to parse plain text into YAML documents.

Continue reading ygrok - Parse plain text into data structures...

Managing SQL Data with Yertl

Tags:

Originally posted on blogs.perl.org -- Managing SQL Data with Yertl

Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.

DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.

Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.

Continue reading Managing SQL Data with Yertl...