# Hive的語法有哪些
## 目錄
1. [Hive簡介](#1-hive簡介)
2. [Hive數據定義語言(DDL)](#2-hive數據定義語言ddl)
- [數據庫操作](#21-數據庫操作)
- [表操作](#22-表操作)
3. [Hive數據操作語言(DML)](#3-hive數據操作語言dml)
- [數據加載](#31-數據加載)
- [數據查詢](#32-數據查詢)
- [數據導出](#33-數據導出)
4. [Hive查詢語法](#4-hive查詢語法)
- [基礎查詢](#41-基礎查詢)
- [聚合函數](#42-聚合函數)
- [JOIN操作](#43-join操作)
- [子查詢](#44-子查詢)
5. [Hive函數](#5-hive函數)
- [內置函數](#51-內置函數)
- [窗口函數](#52-窗口函數)
- [自定義函數](#53-自定義函數)
6. [Hive分區與分桶](#6-hive分區與分桶)
- [分區表](#61-分區表)
- [分桶表](#62-分桶表)
7. [Hive高級特性](#7-hive高級特性)
- [視圖](#71-視圖)
- [索引](#72-索引)
- [事務](#73-事務)
8. [總結](#8-總結)
## 1. Hive簡介
Apache Hive是建立在Hadoop上的數據倉庫基礎架構,它提供了數據查詢和分析的功能。Hive使用類似SQL的查詢語言——HiveQL(HQL),允許熟悉SQL的用戶查詢存儲在Hadoop分布式文件系統(HDFS)中的數據。
## 2. Hive數據定義語言(DDL)
### 2.1 數據庫操作
```sql
-- 創建數據庫
CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
-- 示例
CREATE DATABASE IF NOT EXISTS sales_db
COMMENT 'Sales database'
LOCATION '/user/hive/warehouse/sales.db';
-- 查看數據庫
SHOW DATABASES;
SHOW DATABASES LIKE 'sales*';
-- 使用數據庫
USE database_name;
-- 刪除數據庫
DROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE];
-- 創建表語法
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];
-- 示例:創建內部表
CREATE TABLE employees (
id INT,
name STRING,
salary FLOAT,
department STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
-- 示例:創建外部表
CREATE EXTERNAL TABLE external_employees (
id INT,
name STRING,
salary FLOAT
)
LOCATION '/user/hive/external/employees';
-- 查看表
SHOW TABLES;
SHOW TABLES IN database_name;
DESCRIBE [EXTENDED|FORMATTED] table_name;
-- 修改表
ALTER TABLE table_name RENAME TO new_table_name;
ALTER TABLE table_name ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
ALTER TABLE table_name DROP [COLUMN] column_name;
-- 刪除表
DROP TABLE [IF EXISTS] table_name;
-- 從本地文件加載
LOAD DATA LOCAL INPATH '/path/to/local/file'
[OVERWRITE] INTO TABLE table_name
[PARTITION (part_col1=val1, part_col2=val2 ...)];
-- 從HDFS加載
LOAD DATA INPATH '/hdfs/path/to/file'
[OVERWRITE] INTO TABLE table_name;
-- 示例
LOAD DATA LOCAL INPATH '/home/user/employee_data.csv'
OVERWRITE INTO TABLE employees;
-- 插入數據
INSERT INTO TABLE table_name
VALUES (value1, value2, ...);
-- 從查詢結果插入
INSERT [OVERWRITE|INTO] TABLE table_name
SELECT select_statement FROM from_statement;
-- 基本查詢
SELECT [ALL|DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list [ASC|DESC]]
[LIMIT number];
-- 示例
SELECT name, salary FROM employees WHERE salary > 5000;
-- 導出到本地
INSERT OVERWRITE LOCAL DIRECTORY '/path/to/local/dir'
SELECT * FROM table_name;
-- 導出到HDFS
INSERT OVERWRITE DIRECTORY '/hdfs/path/to/dir'
SELECT * FROM table_name;
-- 使用多行插入
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)]
SELECT select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] SELECT select_statement2] ...;
-- 選擇特定列
SELECT column1, column2 FROM table_name;
-- 使用別名
SELECT column1 AS alias1, column2 alias2 FROM table_name;
-- 使用表達式
SELECT salary * 1.1 AS new_salary FROM employees;
-- 條件查詢
SELECT * FROM employees WHERE department = 'IT' AND salary > 6000;
-- 模糊查詢
SELECT * FROM employees WHERE name LIKE 'J%';
-- 范圍查詢
SELECT * FROM employees WHERE salary BETWEEN 5000 AND 8000;
-- IN查詢
SELECT * FROM employees WHERE department IN ('IT', 'HR');
-- 常用聚合函數
SELECT
COUNT(*) AS total_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary,
SUM(salary) AS total_salary
FROM employees;
-- GROUP BY分組
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- HAVING過濾
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 6000;
-- 內連接
SELECT e.name, d.department_name
FROM employees e JOIN departments d
ON e.department_id = d.id;
-- 左外連接
SELECT e.name, d.department_name
FROM employees e LEFT OUTER JOIN departments d
ON e.department_id = d.id;
-- 右外連接
SELECT e.name, d.department_name
FROM employees e RIGHT OUTER JOIN departments d
ON e.department_id = d.id;
-- 全外連接
SELECT e.name, d.department_name
FROM employees e FULL OUTER JOIN departments d
ON e.department_id = d.id;
-- 交叉連接
SELECT e.name, d.department_name
FROM employees e CROSS JOIN departments d;
-- WHERE子句中的子查詢
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- FROM子句中的子查詢
SELECT dept.avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept
WHERE dept.avg_salary > 7000;
-- IN子查詢
SELECT name
FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE location = 'New York'
);
-- 數學函數
SELECT ABS(-10), ROUND(3.1415, 2), CEIL(3.14), FLOOR(3.14);
-- 字符串函數
SELECT
CONCAT('Hello', ' ', 'World'),
SUBSTR('Hello World', 1, 5),
TRIM(' Hello '),
UPPER('hello'),
LOWER('WORLD');
-- 日期函數
SELECT
CURRENT_DATE(),
YEAR('2023-10-15'),
MONTH('2023-10-15'),
DATEDIFF('2023-12-31', '2023-01-01');
-- 條件函數
SELECT
IF(salary > 5000, 'High', 'Low') AS salary_level,
CASE
WHEN salary < 3000 THEN 'Low'
WHEN salary BETWEEN 3000 AND 7000 THEN 'Medium'
ELSE 'High'
END AS salary_category
FROM employees;
-- ROW_NUMBER()
SELECT
name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;
-- RANK()和DENSE_RANK()
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
-- 聚合窗口函數
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS avg_dept_salary,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;
-- 創建臨時函數
CREATE TEMPORARY FUNCTION function_name AS 'class_name';
-- 示例:創建UDF
-- 1. 編寫Java類繼承UDF
-- 2. 打包為JAR
-- 3. 在Hive中注冊
ADD JAR /path/to/udf.jar;
CREATE TEMPORARY FUNCTION my_upper AS 'com.example.MyUpperUDF';
-- 使用UDF
SELECT my_upper(name) FROM employees;
-- 創建永久函數
CREATE FUNCTION database_name.function_name AS 'class_name'
USING JAR 'hdfs:///path/to/jar';
-- 創建分區表
CREATE TABLE employee_partitioned (
id INT,
name STRING,
salary FLOAT
)
PARTITIONED BY (department STRING, year INT)
STORED AS ORC;
-- 加載數據到特定分區
LOAD DATA LOCAL INPATH '/path/to/data'
INTO TABLE employee_partitioned
PARTITION (department='IT', year=2023);
-- 動態分區
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE employee_partitioned
PARTITION (department, year)
SELECT id, name, salary, department, 2023 AS year
FROM employees;
-- 查看分區
SHOW PARTITIONS employee_partitioned;
-- 添加分區
ALTER TABLE employee_partitioned ADD PARTITION (department='HR', year=2023);
-- 刪除分區
ALTER TABLE employee_partitioned DROP PARTITION (department='HR', year=2023);
-- 創建分桶表
CREATE TABLE employee_bucketed (
id INT,
name STRING,
salary FLOAT,
department STRING
)
CLUSTERED BY (department) INTO 4 BUCKETS
STORED AS ORC;
-- 啟用分桶
SET hive.enforce.bucketing = true;
-- 插入數據到分桶表
INSERT INTO TABLE employee_bucketed
SELECT * FROM employees;
-- 分桶抽樣查詢
SELECT * FROM employee_bucketed TABLESAMPLE(BUCKET 1 OUT OF 4 ON department);
-- 創建視圖
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...)]
[COMMENT view_comment]
[TBLPROPERTIES (property_name=property_value, ...)]
AS SELECT ...;
-- 示例
CREATE VIEW high_paid_employees AS
SELECT name, salary FROM employees WHERE salary > 8000;
-- 查詢視圖
SELECT * FROM high_paid_employees;
-- 刪除視圖
DROP VIEW [IF EXISTS] view_name;
-- 創建索引
CREATE INDEX index_name
ON TABLE table_name (column_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (column_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"];
-- 重建索引
ALTER INDEX index_name ON table_name REBUILD;
-- 刪除索引
DROP INDEX [IF EXISTS] index_name ON table_name;
-- 啟用事務支持
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;
-- 創建支持ACID的表
CREATE TABLE transactional_table (
id INT,
name STRING
)
CLUSTERED BY (id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');
-- 事務操作
START TRANSACTION;
INSERT INTO transactional_table VALUES (1, 'Alice');
UPDATE transactional_table SET name = 'Bob' WHERE id = 1;
COMMIT;
本文詳細介紹了Hive的各種語法,包括: - 數據定義語言(DDL):數據庫和表的創建、修改、刪除 - 數據操作語言(DML):數據的加載、查詢和導出 - 豐富的查詢語法:基礎查詢、聚合函數、JOIN操作和子查詢 - 內置函數和自定義函數 - 分區與分桶技術 - 高級特性如視圖、索引和事務
Hive的語法與SQL非常相似,使得熟悉SQL的用戶能夠快速上手。同時,Hive也提供了許多特有的功能來優化大數據處理,如分區、分桶和MapReduce集成等。掌握這些語法對于高效使用Hive進行大數據分析至關重要。
隨著Hive的不斷發展,新的功能和語法也在不斷加入。建議讀者定期查閱官方文檔以獲取最新的語法特性。 “`
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。