重新导入数据

删除原来的表:

drop table part cascade;
drop table supplier cascade;
drop table customer cascade;
drop table dwdate cascade;
drop table lineorder cascade;

指定 sort keysdistribution, 重新创建表:

CREATE TABLE part (
  p_partkey integer not null	sortkey distkey,
  p_name varchar(22) not null,
  p_mfgr varchar(6) not null,
  p_category varchar(7) not null,
  p_brand1 varchar(9) not null,
  p_color varchar(11) not null,
  p_type varchar(25) not null,
  p_size integer not null,
  p_container varchar(10) not null
);

CREATE TABLE supplier (
  s_suppkey integer not null sortkey,
  s_name varchar(25) not null,
  s_address varchar(25) not null,
  s_city varchar(10) not null,
  s_nation varchar(15) not null,
  s_region varchar(12) not null,
  s_phone varchar(15) not null)
diststyle all;

CREATE TABLE customer (
  c_custkey integer not null sortkey,
  c_name varchar(25) not null,
  c_address varchar(25) not null,
  c_city varchar(10) not null,
  c_nation varchar(15) not null,
  c_region varchar(12) not null,
  c_phone varchar(15) not null,
  c_mktsegment varchar(10) not null)
diststyle all;

CREATE TABLE dwdate (
  d_datekey integer not null sortkey,
  d_date varchar(19) not null,
  d_dayofweek	varchar(10) not null,
  d_month varchar(10) not null,
  d_year integer not null,
  d_yearmonthnum integer not null,
  d_yearmonth varchar(8)	not null,
  d_daynuminweek integer not null,
  d_daynuminmonth integer not null,
  d_daynuminyear integer not null,
  d_monthnuminyear integer not null,
  d_weeknuminyear integer not null,
  d_sellingseason varchar(13) not null,
  d_lastdayinweekfl varchar(1) not null,
  d_lastdayinmonthfl varchar(1) not null,
  d_holidayfl varchar(1) not null,
  d_weekdayfl varchar(1) not null)
diststyle all;

CREATE TABLE lineorder (
  lo_orderkey integer not null,
  lo_linenumber integer not null,
  lo_custkey integer not null,
  lo_partkey integer not null distkey,
  lo_suppkey integer not null,
  lo_orderdate integer not null sortkey,
  lo_orderpriority varchar(15) not null,
  lo_shippriority varchar(1) not null,
  lo_quantity integer not null,
  lo_extendedprice integer not null,
  lo_ordertotalprice integer not null,
  lo_discount integer not null,
  lo_revenue integer not null,
  lo_supplycost integer not null,
  lo_tax integer not null,
  lo_commitdate integer not null,
  lo_shipmode varchar(10) not null
);

在导入数据时,将原来的 compupdate off 删掉. 此时执行copy命令会启用自动压缩:

copy customer from 's3://awssampledbuswest2/ssbgz/customer' 
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3' 
gzip region 'us-west-2';

copy dwdate from 's3://awssampledbuswest2/ssbgz/dwdate' 
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3' 
gzip region 'us-west-2';

copy lineorder from 's3://awssampledbuswest2/ssbgz/lineorder' 
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip region 'us-west-2';

copy part from 's3://awssampledbuswest2/ssbgz/part' 
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip region 'us-west-2';

copy supplier from 's3://awssampledbuswest2/ssbgz/supplier' 
iam_role 'arn:aws:iam::145197526627:role/redshift-access-s3'
gzip region 'us-west-2';

记录表的加载时间:

Benchmark 优化前 优化后
Load time (lineorder) 15m 23s 19m 27s
Storage Use(MB)
LINEORDER 25136
PART 88
CUSTOMER 192
DWDATE 40
SUPPLIER 76
Total storage 25532
Query execution time(ms)
Query 1 1850
Query 2 4642
Query 3 2891
Total execution time 9283

image-20221104205713514