SQL for Beginners: Basic Metrics of Several Apps

EN
Meet the next article in the SQL series! You will learn how to conduct end-to-end analytics across several apps using the Union operator.
SQL for Beginners: Basic Metrics of Several Apps
Published
17.12.2021
|
devtodev

In previous articles from the SQL series, we talked about how to start learning SQL, how to use it to find out the number of active users, calculate the main monetization metrics, determine the structure of user purchases, and join operator.

Now it's time to talk about the union operation. In devtodev, apps on different platforms should be located in separate projects and combined into one space. For example, there are two 3 in a row apps in our demo: Android and iOS. Each of them will have its tables with users, events (or the same), and parameters. In case you need  to do end-to-end analytics for several apps at once, we need a vertical union operation.

Log in to your devtodev account and find the SQL Report in the demo project.

Union operator

The union operator combines data from several tables into one while leaving unique rows. If you join two tables with the same row the duplicate will not be included in the result (distinct by default). But if you need all data, including duplicates (for example, you count the number of events committed), then it is better to use union all – this operator will return all rows from the joined tables.

select column_names (1...N)
from table_name
union

select column_names (1...N)
from table_name

There are several conditions for the union operator:

  • The number of columns in all tables must be the same, otherwise, an error will occur.
  • The data types of these cells must also be the same in every query.
  • Order by should be applied to the result of the merge and placed only at the end of the query.

Number of transactions and gross by platforms

In one of the previous articles, we have already described how to calculate Gross and Transactions metrics. Now let's take the number of payments events and the sum of the priceusd parameter of these events from the p102968.payments table (for iOS) and from the p104704.payments table (for Android), and then combine the results using union. We also limited the payment period to last week.

select count() as "Transactions"
, sum(priceusd) as "Gross"
from p102968.payments
where eventtime > current_date - interval '7 day'and eventtime < current_date
union all
select
 count
() as "Transactions"
sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day'and eventtime < current_date

union operator
Query result. Screenshot from devtodev demo

The result of the request will be two lines: the first – metrics for the iOS app, and the second – for the Android app. But if you add sorting to the query or use it in some other way, then you will not be able to identify the metrics and the application.

Therefore, in select, we specify the name of the application as “App”.

select '3 in a row. iOS' as "App"
count() as "Transactions"
sum(priceusd) as "Gross"
from p102968.payments
where eventtime > current_date - interval '7 day'and eventtime < current_date
union
select
 
'3 in a row. Android' as "App"
count() as "Transactions"
sum(priceusd) as "Gross"
from p104704.payments
where eventtime > current_date - interval '7 day'and eventtime < current_date
order by 
3 desc
union operator
Query result. Screenshot from devtodev demo

As a result, we can sort this data or use it in further calculations.

How to use variables

From the example above, you may have noticed that we were filtering the results from multiple tables based on the same conditions:

eventtime > current_date - interval '7 day'and eventtime < current_date

If we need to change the time period (30 days instead of 7), then we will have to do it several times. In the case of joining a dozen tables, you can easily skip changing the condition in one of them and get an incorrect result. For such cases, as well as to improve the readability of the query, you can predefine variables. Variables are declared as follows:

DECLARE @LOCAL_VARIABLE data_type [ = value ] }

In the next query, we count various monetization metrics for the cohort of users who installed the app in the last seven days. To do this, we defined an @install_interval variable of type interval and set the initial value to 7 days, and then in each separate request we filtered users where created> current_date - @install_interval.

The rest of the variables were used both for filtering in where, and for designating the filter itself, for example, select 'Users paid' || @payments_count || ' and more times'.

declare 
@date_from date = '2021-08-01',
@install_interval interval = '7 day',
@payments_count int = 2,
@itemlike text = 'offer%',
@item_list text[] '{"offer1", "offer2", "special offer"}';
 
select 'New users' as "Metric"
count(distinct devtodevid)
from p104704.users
where created > current_date - @install_interval and created < current_date
 
union all
select 
'Paying users'
count(distinct devtodevid)
from p102968.payments
where created > current_date - @install_interval and created < current_date
 
union all
select 'Users paid' || @payments_count || 'and more times'
count(distinct devtodevid)
from p102968.payments
where created > current_date - @install_interval and created < current_date and paymentcount >=@payments_count
 
union all
select 'Users bought items like' || @itemlike
count(distinct devtodevid)
from p102968.payments
where created > current_date - @install_interval and created < current_date and product like =@itemlike
 
union all
select 'Users bought items from the list' || @itemlike
count(distinct devtodevid)
from p102968.payments
where created > current_date - @install_interval and created < current_date and product = any (@item_list)
order by 2 desc
union operator
Query result. Screenshot from devtodev demo

Such a report is very convenient – any of the users can easily edit the values ​​of these variables. In this case, you do not need to understand how the query is built, change the values ​​of the condition in each line.

If we want to preserve the order of the rows in the report, for example, in the order in which the tables are joined, we can add another field to the select of each query and sort by it:

select as "№"

union operator
Query result. Screenshot from devtodev demo

P.S.

In the next article we will look at even more examples of using union and join constructs, as well as explain how to use the results of one query inside another (nested queries).

Read more