Welcome to the next article from the SQL series! In the previous article, we calculated Transactions and Gross for the app on two platforms and got a separate result for each of them.
But what if we only want one summary value for each metric? Then we need to sum data using the union operation and use the result of the union as a table in the from operator. Thereafter in the select, we will calculate the sum of the Transactions and Gross columns from the table above.
Log in to your devtodev account and find the SQL Report in the demo project.
select 'Metrics for all projects' as "App"
, sum(transactions) as "Transactions"
, sum(gross) as "Gross"
from (
select count() as transactions
, sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_dateunion all
select count() as "Transactions"
, sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
) as metrics_by_platform
The from operator now contains an entire query that accesses two tables at once. The same can be done with the inner query, adding another select query to each of the from, if necessary. Such queries must be enclosed in brackets and have a name – the name of the resulting table.
) as metrics_by_platform
This construction can be used in all table-related statements, e.g. in join.
Inner join (select ... from ... where ...) as join_table
on join_table.param = t.param
Read more: SQL for Beginners: How to Track First In-App Events
Metrics for Each App and in Total
Let's calculate the total metrics for all apps in one request, and also give a transcript below.
select 'Metrics for all projects' as "App"
, sum(transactions) as "Transactions"
, sum(gross) as "Gross"
from (
select '3 in a row. iOS' as "App"
, count() as transactions
, sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_dateunion all
select '3 in a row. Android' as "App"
, count() as transactions
, sum(priceusd) as gross
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
) metrics_by_platformunion all
select '3 in a row. iOS' as "App"
, count() as transactions
, sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_dateunion
select '3 in a row. Android' as "App"
, count() as "Transactions"
, sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
order by 3 desc
It turns out to be a rather large query in which we access each of the payments tables twice. Besides, we wrote the same code twice (when changing the query, we will have to make changes in two places).
To avoid this, we can create a view (Common Table Expression or CTE – is a temporary named result set), and then refer to it several times during the query. A construct can contain complex queries and refer to other views.
with temp_table_name as
(select ... from ...)
We can say that we are creating a temporary table that is calculated once when the query is executed, even if you access it from different places. Using CTE views also makes the query much easier to read and edit.
with metrics_by_platform as (
select '3 in a row. iOS' as app
, count() as transactions
, sum(priceusd) as gross
from p102968.payments
where eventtime > current_date - interval '9 day' and eventtime < current_dateunion all
select '3 in a row. Android' as app
, count() as transactions
, sum(priceusd) as gross
from p104704.payments
where eventtime > current_date - interval '9 day' and eventtime < current_date
)
select 'Metrics for all projects' as "App"
, sum(transactions) as "Transactions"
, sum(gross) as "Gross"
from metrics_by_platformunion all
select app
, transactions
, gross
from metrics_by_platform
order by 3 desc
Looks much easier, doesn't it? If we add a new app and want to analyze its metrics as well, we simply add it to the metrics_by_platform view. The calculation of the metrics and the final output of the results does not depend on the number of applications.
Read more: SQL for Beginners: How to Generate Dates and Numbers
What is a Nested Query?
Let’s go through an example together. We need to find out what is the maximum number of payments made by one user in 7 days and how many such users. Complex queries are always better to write in parts, so we will start with the maximum number of payments.
We have a table with all user payments from p102968.payment. From it, we will count the number of payments made for each of the users, grouping them by devtodevid. Then we will find the maximum number of such payments using max().
select max(user_payments) as "max_payments"
from (
select devtodevid
, count() user_payments
from p102968.payments
where eventtime > current_date - interval '9 day' and eventtime < current_date
group by devtodevid
) as payments_count
It remains to find out how many users made 12 payments during the same time.
To do this, we will put the query we just executed in the where user_payments=(query) filter, which will leave us with only users with a matching maximum number of payments. The query will return the number of such users select count() as "Users" and the maximum number of payments max(user_payments) as "Max payments count" from the table from (...) as payments_count.
Read more: SQL for Beginners: Query Basics
select count(devtodevid) as "Users"
, max(user_payments) as "Max payments count"
from (
select devtodevid
, count() user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by devtodevid
) as payments_count
where user_payments = (select max(user_payments)
from
(select devtodevid
, count() user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by devtodevid) as payments_count
)
For each row from the outer query will be made a comparison of the maximum user payments number where user_payments = (...). In our code, we used the same query twice, so let's optimize it with the CTE view.
with payments_count as (
select devtodevid
, count() user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by devtodevid
)select count() as "Users"
, max(user_payments) as "Payments count"
from payments_count
where user_payments = (select max(user_payments)
from payments_count
And what is the share of users with that many payments among all paying users? Maybe they were the only ones paying?
To find out, we have to add a nested query directly to the select of our code which will count all paying users. Then we will divide all users with the maximum payments by this number.
Read more: SQL for Beginners: Joins and Funnels
with payments_count as (
select devtodevid
, count() user_payments
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
group by devtodevid
)select count() as "Users with max payments count"
, max(user_payments) as "Payments count"
, round(count()*100::numeric / (select count(distinct devtodevid)
from p102968.payments
where eventtime > current_date - interval '7 day' and eventtime < current_date
, 2) ||'%' as "% of all payers"
from payments_count
where user_payments = (select max(user_payments)
from payments_count)
Nested queries inside a select is a fairly common practice and is often used to calculate a share of something or display information from another table without using a join.
P.S.
In this article, we looked at several examples of using temporary tables and nested queries. Next time you will learn how to fill in empty dates on charts and form a distribution histogram.
Read more: SQL for Beginners: How to Calculate Average Check, Transactions, ARPPU