目录
一、什么是索引
二、原理
三、详细内容
四、完整流程示例
1.创建索引语法
2.生成索引
3.查看索引
4.更新索引属性
5.删除索引
二、原理
Hive可以在指定列上建立索引,会产生一张索引表(Hive的一张物理表),里面的字段包括,索引列的值、该值对应的HDFS文件路径、该值在文件中的偏移量;在执行索引字段查询时候,首先额外生成一个MapReduce job,根据对索引列的过滤条件,从索引表中过滤出索引列的值对应的hdfs文件路径及偏移量,输出到hdfs上的一个文件中,然后根据这些文件中的 hdfs路径和偏移量,筛选原始input文件,生成新的split,作为整个job的split,达到不用全表扫描的目的。
不过从Hive3.0开始,Hive已经移除了索引。
移除索引后的替代方案:
1.可以自动重写的物化视图
2.使用列式文件格式(Parquet、ORC)ps:可以进行选择性扫描;甚至可以跳过整个文件/块。
三、详细内容
创建、查看与删除索引
Create/build, show, and drop index:
CREATE INDEX table01_index ON TABLE table01 (column2) AS 'COMPACT';
SHOW INDEX ON table01;
DROP INDEX table01_index ON table01;
先创建在构建、查看与删除索引
Create then build, show formatted (with column names), and drop index:
CREATE INDEX table02_index ON TABLE table02 (column3) AS 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX table02_index ON table2 REBUILD;
SHOW FORMATTED INDEX ON table02;
DROP INDEX table02_index ON table02;
创建位图(bitmap)索引、查看与删除
Create bitmap index, build, show, and drop:
CREATE INDEX table03_index ON TABLE table03 (column4) AS 'BITMAP' WITH DEFERRED REBUILD;
ALTER INDEX table03_index ON table03 REBUILD;
SHOW FORMATTED INDEX ON table03;
DROP INDEX table03_index ON table03;
在一张新表上创建索引
Create index in a new table:
CREATE INDEX table04_index ON TABLE table04 (column5) AS 'COMPACT' WITH DEFERRED REBUILD IN TABLE table04_index_table;
创建RCFile格式的索引
Create index stored as RCFile:
CREATE INDEX table05_index ON TABLE table05 (column6) AS 'COMPACT' STORED AS RCFILE;
创建TEXTFILE格式的索引
Create index stored as text file:
CREATE INDEX table06_index ON TABLE table06 (column7) AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
创建带有索引属性的索引
Create index with index properties:
CREATE INDEX table07_index ON TABLE table07 (column8) AS 'COMPACT' IDXPROPERTIES ("prop1"="value1", "prop2"="value2");
创建带有表属性的索引
Create index with table properties:
CREATE INDEX table08_index ON TABLE table08 (column9) AS 'COMPACT' TBLPROPERTIES ("prop3"="value3", "prop4"="value4");
删除索引
Drop index if exists:
DROP INDEX IF EXISTS table09_index ON table09;
在分区上重建索引
Rebuild index on a partition:
ALTER INDEX table10_index ON table10 PARTITION (columnX='valueQ', columnY='valueR') REBUILD;
四、完整流程示例
1.创建索引语法
-- 创建创建索引
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"]
;
--详情说明
指定索引处理器 AS ‘index.handler.class.name’
一般使用org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler处理器
表明创建一个空索引 [WITH DEFERRED REBUILD]
索引的参数 [IDXPROPERTIES (property_name=property_value, …)]
索引存储在哪个表 [IN TABLE index_table_name]
指定分区创建索引,不指定默认所有分区都创建索引
[PARTITIONED BY (col_name, …)]
[
[ ROW FORMAT …] STORED AS …
| STORED BY …
]
2.生成索引
alter index 索引名称 on 表名 rebuild;
3.查看索引
SHOW FORMATTED INDEX ON table_name;
4.更新索引属性
ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;
5.删除索引
DROP INDEX [IF EXISTS] index_name ON table_name;