I use PostgreSql a lot in my working time. So recently, I spent some time refreshing and deepening my knowledge of PostgreSQL to improve my skills and experience in using it (writing and optimizing queries, creating new objects, etc.). And I found many awesome features and syntax sugar constructions that can tremendously ease your daily routine and eliminate problems that may appear while resolving sophisticated tasks. In this article, I will try to review 6 PostgreSql traits that seem to me the most important and easy-usable in a clear and brief way.
Features
Identity
Serial is the “old” implementation of auto-generated unique values that has been part of Postgres for ages. However that is not part of the SQL standard.
To be more compliant with the SQL standard, Postgres 10 introduced the syntax using generated as identity.
The underlying implementation is still based on a sequence, the definition now complies with the SQL standard. One thing that this new syntax allows is to prevent an accidental override of the value.
Consider the following tables:
Now when you run:
The underlying sequence and the values in the table are not in sync any more. If you run another
You will get an error because the sequence was not advanced by the first insert, and now tries to insert the value 1 again.
With the second table however,
Results in:
ERROR: cannot insert into column "id" Detail: Column "id" is an identity column defined as GENERATED ALWAYS.
So you can’t accidentally “forget” the sequence usage. You can still force this, using the override system value option:
which still leaves you with a sequence that is out-of-sync with the values in the table, but at least you were made aware of that.
identity columns also have another advantage: they also minimize the grants you need to give to a role in order to allow inserts.
While a table using a serial column requires the INSERT privilege on the table and the USAGE privilege on the underlying sequence this is not needed for tables using an identity columns. Granting the INSERT privilege is enough.
It is recommended to use the new identity syntax rather than serial.
For more information, you can refer to the official documentation
COALESCE + NULLIF
As everybody knows the COALESCE function accepts an unlimited number of arguments and returns the first argument that is not null. If all arguments are null, the COALESCE function will return null. Here is how it interprets multiple null values:
But what if you want similar to COALESCE function for zero instead of null. In that case you can combine NULLIF with COALESCE: The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
See the following example:
So in the example proposed above you to make COALESCE return first non-zero value, you can simply replace all COALESCE arguments with such call: NULLIF(arg, 0)
Grouping set, rollup, cube
grouping set, rollup, cube are irreplaceable tools for reporting and performance purposes.
Grouping set
GROUP BY will turn every distinct entry in a column into a group. Sometimes you might want to do more grouping at once. Why is that necessary? Suppose you are processing a 10 TB table. Clearly, reading this data is usually the limiting factor in terms of performance. So reading the data once and producing more results at once is appealing. That’s exactly what you can do with GROUP BY GROUP SETS. Suppose we want to produce two results at once:
- GROUP BY country
- GROUP BY product_name
Here’s how it works:
Rollup
When creating reports, you will often need the “bottom line” which sums up what has been shown in the table. The way to do that in SQL is to use “GROUP BY ROLLUP”:
PostgreSQL will inject a couple of rows into the result. As you can see, “Argentina” returns 3 and not just 2 rows. The “product_name = NULL” entry was added by ROLLUP. It contains the sum of all argentinian sales (116 + 27 = 137). Additional rows are injected for both other countries. Finally, a row is added for the overall sales worldwide.
CUBE
In this case, we’ve got all the combinations. Technically, it’s the same as: GROUP BY country + GROUP BY product_name + GROUP BY country_product_name + GROUP BY (). We could do that using more than just one statement, but doing it at once is easier – and a lot more efficient.
Again, NULL values have been added to indicate various aggregation levels.
Common Table Expression
A common table expression is a temporary result set which you can reference within another SQL statement including SELECT, INSERT, UPDATE or DELETE.
Common Table Expressions are temporary in the sense that they only exist during the execution of the query.
The following shows the syntax of creating a CTE:
Common Table Expressions or CTEs are typically used to simplify complex joins and subqueries in PostgreSQL.
The following statement illustrates how to join a CTE with a table:
In this example:
First, the CTE returns a result set that includes staff id and the number of rentals. Then, join the staff table with the CTE using the staff_id column.
Domains
In my opinion, this is a very useful feature for creating a custom type if you have many columns that need to be restricted by several of the same constraints, have same data type and/or same default value. Let’s create, for example, a VARCHAR domain that should have a not null constraint and a default value of ‘N/A’.
USING keyword
The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.
Conclusion
This post covers simple and helpful features of Postgresql. For more info, you can always refer to official docs.
Marat Badykov
Full-stack web developer. I enjoy writing Php, Java, and Js.