测试优化前性能

本节我们将从三个维度测试集群的性能:

  1. 从S3导入redshift加载时间,上一节已完成
  2. 几张表的磁盘占用空间
  3. 查询语句性能

我们将把这三个测试的数据记录到以下表格:

Benchmark 优化前 优化后
Load time (lineorder) 15m 23s
Storage Use
LINEORDER
PART
CUSTOMER
DWDATE
SUPPLIER
Total storage
Query execution time
Query 1
Query 2
Query 3
Total execution time

磁盘占用空间的统计

统计每张表占用了多少MB:

select stv_tbl_perm.name as table, count(*) as mb
from stv_blocklist, stv_tbl_perm
where stv_blocklist.tbl = stv_tbl_perm.id
and stv_blocklist.slice = stv_tbl_perm.slice
and stv_tbl_perm.name in ('lineorder','part','customer','dwdate','supplier')
group by stv_tbl_perm.name
order by 1 asc;

结果:

image-20221104202033063

将结果记录到表格:

Benchmark 优化前 优化后
Load time (lineorder) 15m 23s
Storage Use(MB)
LINEORDER 25136
PART 88
CUSTOMER 192
DWDATE 40
SUPPLIER 76
Total storage
Query execution time
Query 1
Query 2
Query 3
Total execution time

测试查询性能

在执行第一次SQL查询时,Redshift会进行编译,然后将编译好的命令发送到计算节点。所以,在记录SQL的执行时间时,不应该使用第一次的执行时间,要记录第二次的执行时间。

为了减少查询的执行时间,redshift会将结果缓存到leader node,第二次执行相同的查询会命中缓存(前面介绍过这个特性)

所以要将先缓存禁掉:

set enable_result_cache_for_session to off;

执行下面三条命令:

-- Query 1
-- Restrictions on only one dimension. 
select sum(lo_extendedprice*lo_discount) as revenue
from lineorder, dwdate
where lo_orderdate = d_datekey
and d_year = 1997 
and lo_discount between 1 and 3 
and lo_quantity < 24;

-- Query 2
-- Restrictions on two dimensions 

select sum(lo_revenue), d_year, p_brand1
from lineorder, dwdate, part, supplier
where lo_orderdate = d_datekey
and lo_partkey = p_partkey
and lo_suppkey = s_suppkey
and p_category = 'MFGR#12'
and s_region = 'AMERICA'
group by d_year, p_brand1
order by d_year, p_brand1;

-- Query 3
-- Drill down in time to just one month 

select c_city, s_city, d_year, sum(lo_revenue) as revenue 
from customer, lineorder, supplier, dwdate
where lo_custkey = c_custkey
and lo_suppkey = s_suppkey
and lo_orderdate = d_datekey
and (c_city='UNITED KI1' or
c_city='UNITED KI5')
and (s_city='UNITED KI1' or
s_city='UNITED KI5')
and d_yearmonth = 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, revenue desc;

分别记录它们的查询时间:

Benchmark 优化前 优化后
Load time (lineorder) 15m 23s
Storage Use(MB)
LINEORDER 25136
PART 88
CUSTOMER 192
DWDATE 40
SUPPLIER 76
Total storage 25532
Query execution time(ms)
Query 1 1850
Query 2 4642
Query 3 2891
Total execution time 9283