Second day with PostgreSQL, this time to discuss advanced queries, stored procedures, and rewriting rules.
The relational model was designed to make it easy to extract meaningful information from the database (although here the operative word is meaningful rather than easy). Aggregations pretty much require a schema to return anything worth the effort (and if you don’t believe it, what exactly is the average value of 1, blue and 3?).
Stored procedures can help to move some business intelligence from the main application to the database. Whether it is a good idea is an open question (and I think the knee-jerk negative answer is too often guided by ignorance rather than experience), but the idea of having enough rules in the database that multiple applications can connect to it safely (or safely enough) is worth considering: it enables other applications, some of which might be incompatible with the main business application, to use the business data (many reporting and ETL solutions might fall into this category). The choice is between control and openness (and which one is correct depends on the situation).
Finally, rewriting rules is a less common feature of SQL databases, but essentially it allows the database designer to create updatable views, implement versioning on specific tables, and so on.
I have a small peeve with the book so far: all too often the authors rely on implicit column ordering when manipulating data. I happen to have a different order for the columns of events
(which was created as an exercise yesterday), so many INSERT
examples no longer work.
But in general, it is good to specify the columns in an INSERT
, as in:
1 2 3 4 |
|
And to create ‘My Place’:
1 2 3 |
|
Finally, to add the new events:
1 2 3 4 5 6 |
|
This way, there is never any ambiguity as to what is inserted.
Window functions
The book covers window functions, but in this first beta version of the book at least, the explanation is not very illuminating.
Basically, window functions are a generalization of aggregate functions. Aggregate functions operates on a range of rows selected by a GROUP BY
clause. For each group, there will be only one row, where columns are either grouped by columns, or aggregates.
Window functions also operate on a range of rows, but there is one range for each row in the filtered table. The range can be created in a way similar to GROUP BY
(using the PARTITION OVER
clause), but can also be created by taking all the rows up to the current one, or 2 rows before, 2 rows after, and the current one, … Such a range is called a window. There are many ways to define them.
For instance, say we want to know, for each events, how many events have happened (including the current one), we can try:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The OVER
does not specify a PARTITION
, but an ORDER
, which means that the COUNT(*)
function operates on all the rows from first one to current one (ordered by the starts
column). Actually, the COUNT(*)
function is the same as the (proper) window function RANK
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
What about computing the order of each events, but by year? Nothing easier:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Ok, this is not very fancy. But with numeric data, window functions are more powerful. Lets say we collect daily measures into a new table:
1 2 3 4 5 6 |
|
We could use the usual aggregate functions, for instance AVG
and SUM
:
1 2 3 4 5 |
|
We have pretty much all the details. If there’s any trend, we would not see it. But with window functions, it is possible to compute running averages (and actually, two different ones):
1 2 3 |
|
(results omitted because they are too long). Here, the AVG(measure)
is applied to a window of either 5 rows for short_avg
or 11 rows for long_avg
, and there’s a running sum in running_sum
.
Pretty cool, I’d say.
Rules
Rules are very useful as well. Here’s how I’d implement the INSERT
on holidays
:
1 2 3 |
|
Note the syntax to manipulate dates and timestamp. It is fairly readable and compact enough.
Exercises
Aggregate Functions
The aggregate functions are documented here, while the window functions are here.
GUI
Honestly, I don’t really use any. psql
is really powerful. For those of the GUI persuasion, there are a few options.
DELETE Rule
1 2 |
|
With this rule in place, deleting from venues
now set the active
flag to false:
1 2 3 4 5 6 7 8 9 10 |
|
generate_series in crosstab
The documentation for crosstab
has already an example for using generate_series
:
1 2 3 4 5 6 7 8 9 |
|
The output remains the same:
1 2 3 4 5 |
|
Weekly pivot
The hardest, perhaps, is to find how to convince PostgreSQL to apply div
to the passed arguments, but a bit of type declaration using ::
does the trick. Also, crosstab
is not overly smart, so the month
and week
columns must be in order, otherwise the counts for the two February (one in 2012 and one in 2013) are different:
1 2 3 4 5 6 7 8 9 |
|
The query produces:
1 2 3 4 5 6 7 |
|
And this completes Day 2.