SSB(Star Schema Benchmark
下载地址: https://github.com/Kyligence/ssb-kylin
)是针对OLAP星型模型的开源压测工具。
测试的过程会创建5张表,表的数据量可以根据参数调整。SSB 的表结构如下图所示:
其中 lineorder 为事实表(fact table
),其他四张为维度表(dimension table
),每张维度表通过 primary key
与事实表关联,这是标准的星型结构。数据仓库系统通常使用星形结构
创建schema:
create schema ssb;
set search_path = ssb, public; # 这样下面语句创建的物化视图会在ssb schema下面
创建没有指定sort keys, distribution styles, compression encodings
的表:
CREATE TABLE part
(
p_partkey INTEGER NOT NULL,
p_name VARCHAR(22) NOT NULL,
p_mfgr VARCHAR(6) NOT NULL,
p_category VARCHAR(7) NOT NULL,
p_brand1 VARCHAR(9) NOT NULL,
p_color VARCHAR(11) NOT NULL,
p_type VARCHAR(25) NOT NULL,
p_size INTEGER NOT NULL,
p_container VARCHAR(10) NOT NULL
);
CREATE TABLE supplier
(
s_suppkey INTEGER NOT NULL,
s_name VARCHAR(25) NOT NULL,
s_address VARCHAR(25) NOT NULL,
s_city VARCHAR(10) NOT NULL,
s_nation VARCHAR(15) NOT NULL,
s_region VARCHAR(12) NOT NULL,
s_phone VARCHAR(15) NOT NULL
);
CREATE TABLE customer
(
c_custkey INTEGER NOT NULL,
c_name VARCHAR(25) NOT NULL,
c_address VARCHAR(25) NOT NULL,
c_city VARCHAR(10) NOT NULL,
c_nation VARCHAR(15) NOT NULL,
c_region VARCHAR(12) NOT NULL,
c_phone VARCHAR(15) NOT NULL,
c_mktsegment VARCHAR(10) NOT NULL
);
CREATE TABLE dwdate
(
d_datekey INTEGER NOT NULL,
d_date VARCHAR(19) NOT NULL,
d_dayofweek VARCHAR(10) NOT NULL,
d_month VARCHAR(10) NOT NULL,
d_year INTEGER NOT NULL,
d_yearmonthnum INTEGER NOT NULL,
d_yearmonth VARCHAR(8) NOT NULL,
d_daynuminweek INTEGER NOT NULL,
d_daynuminmonth INTEGER NOT NULL,
d_daynuminyear INTEGER NOT NULL,
d_monthnuminyear INTEGER NOT NULL,
d_weeknuminyear INTEGER NOT NULL,
d_sellingseason VARCHAR(13) NOT NULL,
d_lastdayinweekfl VARCHAR(1) NOT NULL,
d_lastdayinmonthfl VARCHAR(1) NOT NULL,
d_holidayfl VARCHAR(1) NOT NULL,
d_weekdayfl VARCHAR(1) NOT NULL
);
CREATE TABLE lineorder
(
lo_orderkey INTEGER NOT NULL,
lo_linenumber INTEGER NOT NULL,
lo_custkey INTEGER NOT NULL,
lo_partkey INTEGER NOT NULL,
lo_suppkey INTEGER NOT NULL,
lo_orderdate INTEGER NOT NULL,
lo_orderpriority VARCHAR(15) NOT NULL,
lo_shippriority VARCHAR(1) NOT NULL,
lo_quantity INTEGER NOT NULL,
lo_extendedprice INTEGER NOT NULL,
lo_ordertotalprice INTEGER NOT NULL,
lo_discount INTEGER NOT NULL,
lo_revenue INTEGER NOT NULL,
lo_supplycost INTEGER NOT NULL,
lo_tax INTEGER NOT NULL,
lo_commitdate INTEGER NOT NULL,
lo_shipmode VARCHAR(10) NOT NULL
);
加载SSB测试数据:
copy customer from 's3://awssampledbuswest2/ssbgz/customer'
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip compupdate off region 'us-west-2';
copy dwdate from 's3://awssampledbuswest2/ssbgz/dwdate'
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip compupdate off region 'us-west-2';
copy lineorder from 's3://awssampledbuswest2/ssbgz/lineorder'
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip compupdate off region 'us-west-2';
copy part from 's3://awssampledbuswest2/ssbgz/part'
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip compupdate off region 'us-west-2';
copy supplier from 's3://awssampledbuswest2/ssbgz/supplier'
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip compupdate off region 'us-west-2';
1.根据实际情况,将iam_role
和region
做替换
2.上面的命令中,COMPUPDATE=OFF
, 这样在导入数据时不会开启自动压缩
参考: https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html#copy-compupdate
小表导入过程大概要加载几秒,最大的表lineorder
耗时最长,花了15多分钟:
查询每张表的条数:
select count(*) from LINEORDER union
select count(*) from PART union
select count(*) from CUSTOMER union
select count(*) from SUPPLIER union
select count(*) from DWDATE;
结果: