We’ve been using Redshift, Amazon’s Data Warehouse Service, in production at Monetate for almost a year. During that time we’ve learned a bit about what slows things down and have collected some useful queries that grant insight into the key contributors to query performance.

What follows is a quick run-through of those queries.

Disk Usage

You can’t load more data into a full cluster. Monitor your per-table usage with:

Each block is 1 MB (so 2717080 blocks for page_view below use 2.7 TB):

Unsorted Space

A large, unsorted region can kill performance even if your queries are otherwise well optimized. Monitor your unsorted rows with:

This will identify tables that could use a vacuum:

Of the 64 billion rows in the offer table; 52.2 billion are sorted and 11.8 billion are not.

In-flight query details

The AWS Redshift console does a pretty good job of showing queries running but doesn’t give a lot of detail into status. List queries currently executing with:

This shows the different stages of execution per query:

In Conclusion

We have these queries as views within our clusters so they can be easily accessed by monitoring scripts and through ad hoc querying. Tack

onto the front of any of the SELECT statements above to do the same within your cluster.