本节我们将从三个维度测试集群的性能:
我们将把这三个测试的数据记录到以下表格:
Benchmark | 优化前 | 优化后 |
---|---|---|
Load time (lineorder) | 15m 23s | |
Storage Use | ||
LINEORDER | ||
PART | ||
CUSTOMER | ||
DWDATE | ||
SUPPLIER | ||
Total storage | ||
Query execution time | ||
Query 1 | ||
Query 2 | ||
Query 3 | ||
Total execution time |
统计每张表占用了多少MB:
select stv_tbl_perm.name as table, count(*) as mb
from stv_blocklist, stv_tbl_perm
where stv_blocklist.tbl = stv_tbl_perm.id
and stv_blocklist.slice = stv_tbl_perm.slice
and stv_tbl_perm.name in ('lineorder','part','customer','dwdate','supplier')
group by stv_tbl_perm.name
order by 1 asc;
结果:
将结果记录到表格:
Benchmark | 优化前 | 优化后 |
---|---|---|
Load time (lineorder) | 15m 23s | |
Storage Use(MB) | ||
LINEORDER | 25136 | |
PART | 88 | |
CUSTOMER | 192 | |
DWDATE | 40 | |
SUPPLIER | 76 | |
Total storage | ||
Query execution time | ||
Query 1 | ||
Query 2 | ||
Query 3 | ||
Total execution time |
在执行第一次SQL查询时,Redshift会进行编译,然后将编译好的命令发送到计算节点。所以,在记录SQL的执行时间时,不应该使用第一次的执行时间,要记录第二次的执行时间。
为了减少查询的执行时间,redshift会将结果缓存到leader node
,第二次执行相同的查询会命中缓存(前面介绍过这个特性)
所以要将先缓存禁掉:
set enable_result_cache_for_session to off;
执行下面三条命令:
-- Query 1
-- Restrictions on only one dimension.
select sum(lo_extendedprice*lo_discount) as revenue
from lineorder, dwdate
where lo_orderdate = d_datekey
and d_year = 1997
and lo_discount between 1 and 3
and lo_quantity < 24;
-- Query 2
-- Restrictions on two dimensions
select sum(lo_revenue), d_year, p_brand1
from lineorder, dwdate, part, supplier
where lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and p_category = 'MFGR#12'
and s_region = 'AMERICA'
group by d_year, p_brand1
order by d_year, p_brand1;
-- Query 3
-- Drill down in time to just one month
select c_city, s_city, d_year, sum(lo_revenue) as revenue
from customer, lineorder, supplier, dwdate
where lo_custkey = c_custkey
and lo_suppkey = s_suppkey
and lo_orderdate = d_datekey
and (c_city='UNITED KI1' or
c_city='UNITED KI5')
and (s_city='UNITED KI1' or
s_city='UNITED KI5')
and d_yearmonth = 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, revenue desc;
分别记录它们的查询时间:
Benchmark | 优化前 | 优化后 |
---|---|---|
Load time (lineorder) | 15m 23s | |
Storage Use(MB) | ||
LINEORDER | 25136 | |
PART | 88 | |
CUSTOMER | 192 | |
DWDATE | 40 | |
SUPPLIER | 76 | |
Total storage | 25532 | |
Query execution time(ms) | ||
Query 1 | 1850 | |
Query 2 | 4642 | |
Query 3 | 2891 | |
Total execution time | 9283 |