Two weeks ago, I was invited to meta::hack v2, the second annual MetaCPAN hackathon. As the primary maintainer of CPAN Testers, I went to continue improving the integration of CPAN Testers data with MetaCPAN and generally improve the performance of CPAN Testers to the benefit of the entire Perl ecosystem.
Would you like to help CPAN Testers during meta::hack v2? Join us on IRC in #cpantesters-discuss on irc.perl.org, join our mailing list on lists.perl.org, or e-mail me directly at firstname.lastname@example.org.
With meta::hack v2 only two weeks away, I’ve written down my todo list for the hackathon. With another brand-new machine graciously provided by ByteMark, who have been hosting CPAN Testers for years, this year’s hackathon will involve more devops tasks to improve reliability and stability of the various parts of the project.
The new server will be the host for CPAN Testers backend processes, the processes that turn the raw incoming data into the various reports used by the websites and downstream systems. It will also be the new home for the CPAN and BackPAN mirrors that CPAN Testers uses for data, and provides to external users as part of CPAN’s mirrors list.
This year I had one goal for CPAN Testers: Replace the current Metabase API with a new API that did not write to Amazon SimpleDB. The current high-availability database that raw incoming test reports are written is Amazon SimpleDB behind an API called Metabase. Metabase is a highly-flexible data storage API designed to work with massive, unstructured data sets and still allow for sane organization and storage of data. Unfortunately, Amazon SimpleDB is as it says on the tin: Simple. Worse, it's expensive: Like most Amazon services, it charges for usage, so there's a huge incentive for CPAN Testers to use it as little as possible (which made some of the code quite obtuse).
So, I made a plan to excise the Metabase. Since we already cached every
raw test report locally in the CPAN Testers MySQL database, I planned to
write a new Metabase API that wrote directly to the cache, and then
adjust the backend processing to read from the cache. I spent the better
part of a month working through all the Metabase APIs, how the data was
stored in the database, and how to translate between a simple JSON
format and the serialized Metabase objects. However, some proper schema
design prevented me from finishing this project: A single
column could not be changed to allow nulls very easily, it being a 600GB
table. The one time where a well-designed schema was a bad thing!
But then Garu, author of cpanm-reporter and CPAN::Testers::Common::Client came up with an idea to make a new test report format. These new reports would have to be stored in a new place, and I discovered that MySQL had recently started building some rich JSON tooling. Making a new JSON test report format and storing it in our new high-availability MySQL cluster seemed like a perfect solution for storing our raw test reports.
After a few weeks of discussion, I finally realized that it would be an easier task to make a backwards-compatible Metabase API write to the new test report MySQL table, even though it increased the amount of work that needed to be done:
- Complete the new test report format schema (Garu)
- Write the new backwards-compatibility Metabase API (Me)
- Write a new test report processor that writes to the old Metabase cache tables (Joel Berger)
- Write a migration script from the old Metabase cache tables to the new test report JSON object (?)
With that plan, I headed for Lyon.
At tonight's Chicago Perl Mongers Office Hours, Ray came up with an interesting problem. While testing all of CPAN for CPAN Testers, how do you detect when a test is hanging and kill it before it takes down the entire machine? How do you simply kill a test that is taking too long? And how do you do it without having a wholly separate watchdog program?
to execute testing jobs in parallel across multiple Perl installs. There
are a few ways we could implement timeouts, including
timeout function, or
built-in, but these must all be implemented in the child process. It'd
be nicer if we could use the parent process to watch its own children.
As part of the MetaCPAN hackathon, meta::hack, I was invited to work on the CPAN Testers integration. CPAN Testers is a community of CPAN users who send in test reports for CPAN modules as they are uploaded. MetaCPAN adds a summary of those test reports to every CPAN distribution to help you determine which module you'd most like to use. For quite a few months, this integration was broken, and the nature of the current integration (a SQLite database) means it is not as generally useful as it could be.
So, I decided that the best way to improve the CPAN Testers / MetaCPAN integration was to build a new CPAN Testers API. This API uses the CPAN Testers schema to expose CPAN Testers data using a JSON API. This API is built using the Mojolicious web framework, and an OpenAPI specification (using Mojolicious::Plugin::OpenAPI.