本节我们将从三个维度测试集群的性能:
我们将把这三个测试的数据记录到以下表格:
| Benchmark | 优化前 | 优化后 | 
|---|---|---|
| Load time (lineorder) | 15m 23s | |
| Storage Use | ||
| LINEORDER | ||
| PART | ||
| CUSTOMER | ||
| DWDATE | ||
| SUPPLIER | ||
| Total storage | ||
| Query execution time | ||
| Query 1 | ||
| Query 2 | ||
| Query 3 | ||
| Total execution time | 
统计每张表占用了多少MB:
select stv_tbl_perm.name as table, count(*) as mb
from stv_blocklist, stv_tbl_perm
where stv_blocklist.tbl = stv_tbl_perm.id
and stv_blocklist.slice = stv_tbl_perm.slice
and stv_tbl_perm.name in ('lineorder','part','customer','dwdate','supplier')
group by stv_tbl_perm.name
order by 1 asc;
结果:

将结果记录到表格:
| Benchmark | 优化前 | 优化后 | 
|---|---|---|
| Load time (lineorder) | 15m 23s | |
| Storage Use(MB) | ||
| LINEORDER | 25136 | |
| PART | 88 | |
| CUSTOMER | 192 | |
| DWDATE | 40 | |
| SUPPLIER | 76 | |
| Total storage | ||
| Query execution time | ||
| Query 1 | ||
| Query 2 | ||
| Query 3 | ||
| Total execution time | 
在执行第一次SQL查询时,Redshift会进行编译,然后将编译好的命令发送到计算节点。所以,在记录SQL的执行时间时,不应该使用第一次的执行时间,要记录第二次的执行时间。
为了减少查询的执行时间,redshift会将结果缓存到leader node,第二次执行相同的查询会命中缓存(前面介绍过这个特性)
所以要将先缓存禁掉:
set enable_result_cache_for_session to off;
执行下面三条命令:
-- Query 1
-- Restrictions on only one dimension. 
select sum(lo_extendedprice*lo_discount) as revenue
from lineorder, dwdate
where lo_orderdate = d_datekey
and d_year = 1997 
and lo_discount between 1 and 3 
and lo_quantity < 24;
-- Query 2
-- Restrictions on two dimensions 
select sum(lo_revenue), d_year, p_brand1
from lineorder, dwdate, part, supplier
where lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and p_category = 'MFGR#12'
and s_region = 'AMERICA'
group by d_year, p_brand1
order by d_year, p_brand1;
-- Query 3
-- Drill down in time to just one month 
select c_city, s_city, d_year, sum(lo_revenue) as revenue 
from customer, lineorder, supplier, dwdate
where lo_custkey = c_custkey
and lo_suppkey = s_suppkey
and lo_orderdate = d_datekey
and (c_city='UNITED KI1' or
c_city='UNITED KI5')
and (s_city='UNITED KI1' or
s_city='UNITED KI5')
and d_yearmonth = 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, revenue desc;
分别记录它们的查询时间:
| Benchmark | 优化前 | 优化后 | 
|---|---|---|
| Load time (lineorder) | 15m 23s | |
| Storage Use(MB) | ||
| LINEORDER | 25136 | |
| PART | 88 | |
| CUSTOMER | 192 | |
| DWDATE | 40 | |
| SUPPLIER | 76 | |
| Total storage | 25532 | |
| Query execution time(ms) | ||
| Query 1 | 1850 | |
| Query 2 | 4642 | |
| Query 3 | 2891 | |
| Total execution time | 9283 |