v.1

SQL for Beginners: Nested Queries and Temporary Tables

EN
Here is our next article on how to use SQL in practice. This time, let's look at some examples of using temporary tables and nested queries.
SQL for Beginners: Nested Queries and Temporary Tables
Published
17.02.2022
|
devtodev

Welcome to the next article from the SQL series! In the previous article, we calculated Transactions and Gross for the app on two platforms and got a separate result for each of them. 

SQL nested queries and temporary tables
Transactions and Gross for the «3 in a row»‎ app. Screenshot from devtodev demo

But what if we only want one summary value for each metric? Then we need to sum data using the union operation and use the result of the union as a table in the from operator. Thereafter in the select, we will calculate the sum of the Transactions and Gross columns from the table above.

Log in to your devtodev account and find the SQL Report in the demo project.

select 'Metrics for all projects' as "App"
sum(transactions) as "Transactions"
sum(gross) as "Gross"
from (
select count() as transactions
sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date

union all
select
 count
() as "Transactions"
sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
) as metrics_by_platform

SQL nested queries and temporary tables
Query result. Screenshot from devtodev demo

The from operator now contains an entire query that accesses two tables at once. The same can be done with the inner query, adding another select query to each of the from, if necessary. Such queries must be enclosed in brackets and have a name – the name of the resulting table.

) as metrics_by_platform

This construction can be used in all table-related statements, e.g. in join.

Inner join (select ... from ... where ...) as join_table
on join_table.paramt.param

Read more: SQL for Beginners: How to Track First In-App Events

Metrics for Each App and in Total

Let's calculate the total metrics for all apps in one request, and also give a transcript below. 

select 'Metrics for all projects' as "App"
sum(transactions) as "Transactions"
sum(gross) as "Gross"
from (
select '3 in a row. iOS' as "App"
count() as transactions
sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date

union all
select
 
'3 in a row. Android' as "App"
count() as transactions
sum(priceusd) as gross
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
) metrics_by_platform

union all
select '3 in a row. iOS' as "App"
count() as transactions
sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date

union
select
 
'3 in a row. Android' as "App"
count() as "Transactions"
sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
order by 
3 desc

SQL temporary table
Query result. Screenshot from devtodev demo

It turns out to be a rather large query in which we access each of the payments tables twice. Besides, we wrote the same code twice (when changing the query, we will have to make changes in two places).

devtodev pricing

To avoid this, we can create a view (Common Table Expression or CTE – is a temporary named result set), and then refer to it several times during the query. A construct can contain complex queries and refer to other views.

with temp_table_name as 
(select ... from ...)

We can say that we are creating a temporary table that is calculated once when the query is executed, even if you access it from different places. Using CTE views also makes the query much easier to read and edit.

Here is what the above query would look like using the CTE views:

with metrics_by_platform as (
select '3 in a row. iOS' as app
count() as transactions
sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '9 day' and eventtime < current_date

union all
select
 
'3 in a row. Android' as app
count() as transactions
sum(priceusd) as gross
from p104704.payments
where eventtime > current_date - interval '9 day' and eventtime < current_date
) 
select 'Metrics for all projects' as "App"
sum(transactions) as "Transactions"
sum(gross) as "Gross"
from metrics_by_platform

union all
select app
transactions
gross
from metrics_by_platform
order by 3 desc

Looks much easier, doesn't it? If we add a new app and want to analyze its metrics as well, we simply add it to the metrics_by_platform view. The calculation of the metrics and the final output of the results does not depend on the number of applications.

Read more: SQL for Beginners: How to Generate Dates and Numbers

What is a Nested Query?

Let’s go through an example together. We need to find out what is the maximum number of payments made by one user in 7 days and how many such users. Complex queries are always better to write in parts, so we will start with the maximum number of payments.

We have a table with all user payments from p102968.payment. From it, we will count the number of payments made for each of the users, grouping them by devtodevid. Then we will find the maximum number of such payments using max().

select max(user_payments) as "max_payments"
from (
select devtodevid
, count() 
user_payments
from p102968.payments
where eventtime > current_date - interval '9 day' and eventtime < current_date
group by devtodevid
) as payments_count

SQL nested query
Query result. Screenshot from devtodev demo

It remains to find out how many users made 12 payments during the same time.

To do this, we will put the query we just executed in the where user_payments=(query) filter, which will leave us with only users with a matching maximum number of payments. The query will return the number of such users select count() as "Users" and the maximum number of payments max(user_payments) as "Max payments count" from the table from (...) as payments_count.

Read more: SQL for Beginners: Query Basics
select count(devtodevid) as "Users"
max(user_payments) as "Max payments count"
from (
select devtodevid
, count() 
user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by devtodevid
) as payments_count

where user_payments = (select max(user_payments)
from
(
select devtodevid
, count() 
user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by
 devtodevid) as payments_count
)

SQL subquery
Query result. Screenshot from devtodev demo

For each row from the outer query will be made a comparison of the maximum user payments number where user_payments = (...). In our code, we used the same query twice, so let's optimize it with the CTE view.

with payments_count as (
select devtodevid
, count() 
user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by
 devtodevid
)

select count() as "Users"
max(user_payments) as "Payments count"
from payments_count
where user_payments = (select max(user_payments)
from payments_count

And what is the share of users with that many payments among all paying users? Maybe they were the only ones paying?

To find out, we have to add a nested query directly to the select of our code which will count all paying users. Then we will divide all users with the maximum payments by this number.

Read more: SQL for Beginners: Joins and Funnels

with payments_count as (
select devtodevid
, count() 
user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by
 devtodevid
)

select count() as "Users with max payments count"
max(user_payments) as "Payments count"
round(count()*100::numeric / (select count(distinct devtodevid)
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
2) ||'%' as "% of all payers"
from payments_count
where user_payments = (select max(user_payments)
from payments_count)

SQL nested queries
Query result. Screenshot from devtodev demo

Nested queries inside a select is a fairly common practice and is often used to calculate a share of something or display information from another table without using a join.

P.S.

In this article, we looked at several examples of using temporary tables and nested queries. Next time you will learn how to fill in empty dates on charts and form a distribution histogram. 

Read more: SQL for Beginners: How to Calculate Average Check, Transactions, ARPPU

Read more