Application Metrics with Yertl

Tags:

A time series database is a massively useful tool for system reporting and monitoring. By storing series of simple values attached to timestamps, an ops team can see how fast their application is processing data, how much traffic they're serving, and how many resources they're consuming. From this data they can determine how well their application is working, track down issues in the system, and plan for future resource needs.

There have been a lot of new databases and tools developed to create, store, and consume time series data, and existing databases are being enhanced to better support time series data.

With the new release of ETL::Yertl, we can easily translate SQL database queries into metrics for monitoring and reporting. I've been using these new features to monitor the CPAN Testers application.

Continue reading Application Metrics with Yertl...

Demo a Live Web App on Bad Internet

Tags:

Originally posted on Opensource.com

Live demos are the bane of professional speakers everywhere. Even the most well-prepared live demo can go wrong for unforeseeable reasons. This is a bad thing to happen while you're up on-stage in front of 300 people. Live demos of remote web apps are so fraught with peril that most people find other ways of presenting them. Screenshots can never fail, and local sandboxes won't fail on overloaded conference Internet connections. But what if we can't set up a local sandbox in time for our talk? What if our database is huge and complex? What if our app has animation and interactions that we can't show with screenshots?

What if we could record our use of a web application and then replay the stored responses at the right time? Lucky for us, it's easy to proxy HTTP, the protocol that web browsers and web servers use to communicate with each other. This means we can put an intermediary between our browser and the server to do whatever we want. Often caching does content filtering (corporate filters, parental filters). But caching data on a server closer to the user can speed up a website.

We're going to use a web proxy in a similar way: We'll cache our content and serve that cached data to our web browser. However, we're going to run our proxy on the same machine as our web browser. And, we're going to set it up to cache only the things that we want. This way we can run a live demo on an unstable connection.

Install and Configure Squid HTTP Proxy

First, we need to install and configure our proxy. I'm on a Mac, so I was able to install the Squid HTTP Proxy via Homebrew, a free package manager for MacOS.

For our live demo, we want to cache the application we are trying to demo and any other content the application needs. Anything else is unnecessary. To do this, Squid has Access Control Lists (ACLs). We configure an ACL with a list of domains that we should cache, and deny everything else. For maximum coverage, we should add both the host name and the IP addresses to the ACL. Since HTTP proxies are also used for DNS, most of the time the proxy is looking up the DNS records. But sometimes a browser already knows the IP and will just tell the proxy to get on with it.

So, here's our list of domains and IPs:

acl cacheDomain dstdomain beta.cpantesters.org
acl cacheDomain dstdomain api.cpantesters.org
acl cacheDomain dstdomain www.cpantesters.org
acl cacheDomain dstdomain 212.110.173.51
acl cacheDomain dstdomain cdnjs.cloudflare.com

The first three domains are the applications that I'm running. The fourth is the IP address for that application server: All the domains are on the same machine. The last is CDNJS, the JavaScript CDN that I get my JavaScript from. In order for my application to work, I will need to cache all the JavaScript and CSS that I depend on from CDNJS.

Once we've listed what we want to cache, we can forbid any other domains from being cached:

cache deny !cacheDomain

Next, we should tell Squid where to put our cache and how much disk space to use. Homebrew's Squid configuration has a cache_dir line, commented out. We need to enable it and increase the disk space available to ensure that our data stays cached. When the disk space is used up, Squid starts deleting old cached data, which we can't have during our demo.

# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /usr/local/var/cache/squid 1024 16 256

The first number at the end of the line is the cache size in MB, which I adjusted to 1024 (1 GB).

Finally, we should make sure that we can use Squid's management API, and that it's only open to the local machine. This should be the default, so look for these http_access lines, and add them if they don't exist.

# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

After allowing cache manager access from localhost, we should disable the cache manager password:

cachemgr_passwd none all

Now we're done with the configuration file. Our full configuration file is located here.

Now that we've configured our proxy, we can start it up. Homebrew says to do brew services start squid, but your platform may need something different. This gets the proxy started and waiting for requests. Next we need to configure our browser to use the proxy.

Configure your web browser

Configuring your web browser for an HTTP proxy depends on what browser you use and what OS you use. If you're using Chrome or Safari on MacOS, you can go to System Preferences to configure a proxy. However, if you're using Firefox, you can configure the browser to use a proxy, and leave the rest of the system alone. Other operating systems have other ways to configure proxies, and you should check your OS's documentation.

There are some good browser plugins for managing HTTP proxies, but unfortunately not for Safari or IE. If you're using Chrome, try Proxy SwitchyOmega. If you're using Firefox, use FoxyProxy Standard. These plugins make it easier to manage HTTP proxies.

Run through the demo to cache your content

Once you configure your proxy, you can run through your demo to test it. Do this on a good Internet connection. As you run through your demo, your browser asks its proxy to fetch all the demo's data. As your proxy does this, it caches it on disk. Since your computer is online, Squid will follow the caching rules that the web server asks it to. This means caching for a specific length of time, and possibly revalidating the data to see if it changed.

As we run through our demo, we should make sure that our cache is being used. The easiest way to do that is to read Squid's log. For my configuration, it was located at /usr/local/var/logs/access.log. Inside are lines that look like this:

1498020228.970    203 ::1 TCP_MISS/200 3653 GET http://beta.cpantesters.org/chart.html? - HIER_DIRECT/212.110.173.51 text/html
1498020229.523    314 ::1 TCP_REFRESH_MODIFIED/200 8130 GET http://api.cpantesters.org/v3/release/dist/Statocles - HIER_DIRECT/212.110.173.51 application/json
1498020236.187   6945 ::1 TCP_MISS/200 148284 GET http://api.cpantesters.org/v3/summary/Statocles/0.077 - HIER_DIRECT/212.110.173.51 application/json
1498020240.783    186 ::1 TCP_MISS/200 6597 GET http://people.w3.org/~dom/archives/2006/09/offline-web-cache-with-squid/ - HIER_DIRECT/128.30.54.11 text/html

The important parts of this line are the URL and the status. TCP_MISS/200 means "this request was not in our cache, and the remote server returned a 200 OK HTTP response". TCP_REFRESH_MODIFIED/200 means "this request was in our cache, but we refreshed it from the remote server which returned a 200 OK HTTP response". This is our cache building and refreshing itself because we're on a stable connection. Once we have some data in our cache, we'll start seeing things like this:

1498063273.261      0 ::1 TCP_INM_HIT/304 299 GET http://beta.cpantesters.org/chart.html - HIER_NONE/- text/html
1498063281.831      0 ::1 TCP_MEM_HIT/200 8187 GET http://api.cpantesters.org/v3/release/dist/Statocles - HIER_NONE/- application/json
1498063298.103      0 ::1 TCP_MEM_HIT/200 8187 GET http://api.cpantesters.org/v3/release/dist/Statocles - HIER_NONE/- application/json
1498063300.473      8 ::1 TCP_MEM_HIT/200 154917 GET http://api.cpantesters.org/v3/summary/Statocles/0.083 - HIER_NONE/- application/json

TCP_INM_HIT/304 means "The cache responded to this request with a 304 Not Modified response". The TCP_MEM_HIT/200 means "The cache responded to this request with a 200 OK HTTP response". These are what we want: The cache is serving responses, not the remote server.

Run Your Demo

Now that our cache is operating well on a stable connection, we can give our demo on an unstable one. First, we want to make sure that our cache does not try to access the remote server (Squid's "offline" mode). To do this, Squid has a management client called squidclient which we can use to toggle offline mode.

$ squidclient mgr:offline_toggle
HTTP/1.1 200 OK
Server: squid/3.5.26
Mime-Version: 1.0
Date: Tue, 04 Jul 2017 21:16:36 GMT
Content-Type: text/plain;charset=utf-8
Expires: Tue, 04 Jul 2017 21:16:36 GMT
Last-Modified: Tue, 04 Jul 2017 21:16:36 GMT
X-Cache: MISS from gwen.local
Via: 1.1 gwen.local (squid/3.5.26)
Connection: close

offline_mode is now ON

Squid's offline mode minimizes attempts to get remote content. Since we cached all our content running through our demo, this means Squid will be serving our demo!

So now we can run our demo worry-free! All the remote content is served by the local machine, so it doesn't matter how good the conference wi-fi is. As long as stick to things we've already cached, our web application runs perfectly.

CPAN Testers Has an API

Tags:

[Watch this lightning talk on The Perl Conference YouTube channel]

I've been working on the CPAN Testers project since 2015. In all that time, I've been focused on maintenance (which has involved more operations/administration tasks than any actual code changes) and modernization. It's that modernization effort that has led to a new CPAN Testers API.

This new API uses the Mojolicious web framework, along with an OpenAPI schema to expose all of the most useful CPAN Testers data. OpenAPI is a specification for web APIs, and there are tools like Swagger to generate a useful documentation website from your spec, like the CPAN Testers API documentation website.

Continue reading CPAN Testers Has an API...

2017 Perl Toolchain Summit

Tags:

This year I had one goal for CPAN Testers: Replace the current Metabase API with a new API that did not write to Amazon SimpleDB. The current high-availability database that raw incoming test reports are written is Amazon SimpleDB behind an API called Metabase. Metabase is a highly-flexible data storage API designed to work with massive, unstructured data sets and still allow for sane organization and storage of data. Unfortunately, Amazon SimpleDB is as it says on the tin: Simple. Worse, it's expensive: Like most Amazon services, it charges for usage, so there's a huge incentive for CPAN Testers to use it as little as possible (which made some of the code quite obtuse).

So, I made a plan to excise the Metabase. Since we already cached every raw test report locally in the CPAN Testers MySQL database, I planned to write a new Metabase API that wrote directly to the cache, and then adjust the backend processing to read from the cache. I spent the better part of a month working through all the Metabase APIs, how the data was stored in the database, and how to translate between a simple JSON format and the serialized Metabase objects. However, some proper schema design prevented me from finishing this project: A single NOT NULL column could not be changed to allow nulls very easily, it being a 600GB table. The one time where a well-designed schema was a bad thing!

But then Garu, author of cpanm-reporter and CPAN::Testers::Common::Client came up with an idea to make a new test report format. These new reports would have to be stored in a new place, and I discovered that MySQL had recently started building some rich JSON tooling. Making a new JSON test report format and storing it in our new high-availability MySQL cluster seemed like a perfect solution for storing our raw test reports.

After a few weeks of discussion, I finally realized that it would be an easier task to make a backwards-compatible Metabase API write to the new test report MySQL table, even though it increased the amount of work that needed to be done:

  • Complete the new test report format schema (Garu)
  • Write the new backwards-compatibility Metabase API (Me)
  • Write a new test report processor that writes to the old Metabase cache tables (Joel Berger)
  • Write a migration script from the old Metabase cache tables to the new test report JSON object (?)

With that plan, I headed for Lyon.

Continue reading 2017 Perl Toolchain Summit...

Nerds Rejecting Nerds

Tags:

https://medium.com/@maradydd/when-nerds-collide-31895b01e68c

I was linked to this article after a discussion that was triggered by a Tweet: https://twitter.com/shadowcat_mst/status/852265380156510214

In this article, the author describes a group called "weird nerds", later renamed "hackers", and goes through some of the reasons why this group is rejecting new members of their community (namely "brogrammers" and "geek feminists", a false equivalence if ever there was one).

As someone who fits the author's idea of a hacker (the classical definition of hacker, not someone who breaks into computers), and yet has never felt like part of the hacker community, there are a lot of things in here that are bad, but I'll comment for now on a couple quotes:

Continue reading Nerds Rejecting Nerds...