Managing SQL Data with Yertl

Tags:

Originally posted on blogs.perl.org -- Managing SQL Data with Yertl

Every week, I work with about a dozen SQL databases. Some are Sybase, some MySQL, some SQLite. Some have different versions in dev, staging, and production. All of them need data extracted, transformed, and loaded.

DBI is the clear choice for dealing with SQL databases in Perl, but there are a dozen lines of Perl code in between me and the operation that I want. Sure, I've got modules and web applications and ad-hoc commands and scripts that perform certain individual tasks on my databases, but sometimes those things don't quite do what I need right now, and I just want something that will let me execute whatever SQL I can come up with.

Yertl (ETL::Yertl) is a shell-based ETL framework. It's under development (as is all software), but included already is a small utility called ysql to make dealing with SQL databases easy.

Continue reading Managing SQL Data with Yertl...

Manage Boilerplate with Import::Base

Tags:

Originally posted as: Manage Boilerplate with Import::Base on blogs.perl.org.

Boilerplate is everything I hate about programming:

  • Doing the same thing more than once
  • Leaving clutter in every file
  • Making it harder to change things in the future
  • Eventually blindly copying without understanding (cargo-cult programming)

In an effort to reduce some of my boilerplate, I wrote Import::Base, a module to collect and import useful bundles of modules, removing the need for long lists of use ... lines everywhere.

Continue reading Manage Boilerplate with Import::Base...

Conflict Resolution: local::lib and git's Perl

Tags:

Originally posted as: Conflict Resolution: local::lib and git's Perl on blogs.perl.org.

I ran into a frustrating problem the other day:

$ git add -i
/usr/bin/perl: symbol lookup error: ~/perl5/lib/perl5/x86_64-linux-thread-multi/auto/List/Util/Util.so:
undefined symbol: Perl_xs_apiversion_bootcheck
fatal: 'add--interactive' appears to be a git command, but we were not
able to execute it. Maybe git-add--interactive is broken?

Continue reading Conflict Resolution: local::lib and git's Perl...

Adventures in Debugging C/XS 2: Debugging Boogaloo

Tags:

... or "Ask Not To Whom The Pointer Points, It Points To Thee."

TL;DR: A pointer is not a reference. A pointer knows nothing about the data being pointed to. Returning multiple values requires actual work.

Everything went wrong when I wanted a string with a NUL character inside it. C strings are not Perl scalars, they don't know how long they are. So to mark the end of a string, C uses the NUL character, \0. The strcpy function will copy to your destination until the first \0 from your source. When you want to have a string with a \0 inside of it that does not mark the end of the string, you need to know exactly how long the string is. This is not difficult to do, but you also have to return that length from the function that creates your string.

C functions do not have more than one return value.

(char* buffer, int bufferSize) = get_string_with_nuls();
// You thought it could be that easy?

So in order for your function to result in more than one value, you have to pass in references to be used to fill in with actual values.

char* buffer;
int bufferSize = get_string_with_nuls( buffer );
// C programmers will already know what I did wrong here

Thinking like a Perl programmer, I thought I could just pass in the pointer to the function and the function could fill it with data. Two problems:

  1. I passed in the pointer itself, not a reference to the pointer: &buffer
  2. I did not initialize the pointer to anything.

A more correct way would be:

char* buffer = malloc( 128 * sizeof( char ) );
int bufferSize = get_string_with_nuls( &buffer );

But this suffers from another problem: I have to know beforehand how big my string is going to be and allocate that much memory beforehand.

The way I ended up succeeding was:

int bufferSize;
char* buffer = get_string_with_nuls( &bufferSize );

This way, get_string_with_nuls can handle the malloc with exactly the correct size and give it to me. I don't have to guess at a size beforehand.

Of course, a struct could do this better, or since I'm actually in C++, an object. I'll be planning a new API as soon as I confirm this one actually works and has proper tests (written in Perl, of course).