溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Hive的語法有哪些

發布時間:2021-12-10 09:57:03 來源:億速云 閱讀:223 作者:小新 欄目:云計算
# Hive的語法有哪些

## 目錄
1. [Hive簡介](#1-hive簡介)
2. [Hive數據定義語言(DDL)](#2-hive數據定義語言ddl)
   - [數據庫操作](#21-數據庫操作)
   - [表操作](#22-表操作)
3. [Hive數據操作語言(DML)](#3-hive數據操作語言dml)
   - [數據加載](#31-數據加載)
   - [數據查詢](#32-數據查詢)
   - [數據導出](#33-數據導出)
4. [Hive查詢語法](#4-hive查詢語法)
   - [基礎查詢](#41-基礎查詢)
   - [聚合函數](#42-聚合函數)
   - [JOIN操作](#43-join操作)
   - [子查詢](#44-子查詢)
5. [Hive函數](#5-hive函數)
   - [內置函數](#51-內置函數)
   - [窗口函數](#52-窗口函數)
   - [自定義函數](#53-自定義函數)
6. [Hive分區與分桶](#6-hive分區與分桶)
   - [分區表](#61-分區表)
   - [分桶表](#62-分桶表)
7. [Hive高級特性](#7-hive高級特性)
   - [視圖](#71-視圖)
   - [索引](#72-索引)
   - [事務](#73-事務)
8. [總結](#8-總結)

## 1. Hive簡介

Apache Hive是建立在Hadoop上的數據倉庫基礎架構,它提供了數據查詢和分析的功能。Hive使用類似SQL的查詢語言——HiveQL(HQL),允許熟悉SQL的用戶查詢存儲在Hadoop分布式文件系統(HDFS)中的數據。

## 2. Hive數據定義語言(DDL)

### 2.1 數據庫操作

```sql
-- 創建數據庫
CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];

-- 示例
CREATE DATABASE IF NOT EXISTS sales_db
COMMENT 'Sales database'
LOCATION '/user/hive/warehouse/sales.db';

-- 查看數據庫
SHOW DATABASES;
SHOW DATABASES LIKE 'sales*';

-- 使用數據庫
USE database_name;

-- 刪除數據庫
DROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE];

2.2 表操作

-- 創建表語法
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) 
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)];

-- 示例:創建內部表
CREATE TABLE employees (
  id INT,
  name STRING,
  salary FLOAT,
  department STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

-- 示例:創建外部表
CREATE EXTERNAL TABLE external_employees (
  id INT,
  name STRING,
  salary FLOAT
)
LOCATION '/user/hive/external/employees';

-- 查看表
SHOW TABLES;
SHOW TABLES IN database_name;
DESCRIBE [EXTENDED|FORMATTED] table_name;

-- 修改表
ALTER TABLE table_name RENAME TO new_table_name;
ALTER TABLE table_name ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
ALTER TABLE table_name DROP [COLUMN] column_name;

-- 刪除表
DROP TABLE [IF EXISTS] table_name;

3. Hive數據操作語言(DML)

3.1 數據加載

-- 從本地文件加載
LOAD DATA LOCAL INPATH '/path/to/local/file' 
[OVERWRITE] INTO TABLE table_name 
[PARTITION (part_col1=val1, part_col2=val2 ...)];

-- 從HDFS加載
LOAD DATA INPATH '/hdfs/path/to/file' 
[OVERWRITE] INTO TABLE table_name;

-- 示例
LOAD DATA LOCAL INPATH '/home/user/employee_data.csv' 
OVERWRITE INTO TABLE employees;

-- 插入數據
INSERT INTO TABLE table_name 
VALUES (value1, value2, ...);

-- 從查詢結果插入
INSERT [OVERWRITE|INTO] TABLE table_name 
SELECT select_statement FROM from_statement;

3.2 數據查詢

-- 基本查詢
SELECT [ALL|DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list [ASC|DESC]]
[LIMIT number];

-- 示例
SELECT name, salary FROM employees WHERE salary > 5000;

3.3 數據導出

-- 導出到本地
INSERT OVERWRITE LOCAL DIRECTORY '/path/to/local/dir'
SELECT * FROM table_name;

-- 導出到HDFS
INSERT OVERWRITE DIRECTORY '/hdfs/path/to/dir'
SELECT * FROM table_name;

-- 使用多行插入
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)]
SELECT select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] SELECT select_statement2] ...;

4. Hive查詢語法

4.1 基礎查詢

-- 選擇特定列
SELECT column1, column2 FROM table_name;

-- 使用別名
SELECT column1 AS alias1, column2 alias2 FROM table_name;

-- 使用表達式
SELECT salary * 1.1 AS new_salary FROM employees;

-- 條件查詢
SELECT * FROM employees WHERE department = 'IT' AND salary > 6000;

-- 模糊查詢
SELECT * FROM employees WHERE name LIKE 'J%';

-- 范圍查詢
SELECT * FROM employees WHERE salary BETWEEN 5000 AND 8000;

-- IN查詢
SELECT * FROM employees WHERE department IN ('IT', 'HR');

4.2 聚合函數

-- 常用聚合函數
SELECT 
  COUNT(*) AS total_count,
  AVG(salary) AS avg_salary,
  MAX(salary) AS max_salary,
  MIN(salary) AS min_salary,
  SUM(salary) AS total_salary
FROM employees;

-- GROUP BY分組
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

-- HAVING過濾
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 6000;

4.3 JOIN操作

-- 內連接
SELECT e.name, d.department_name
FROM employees e JOIN departments d
ON e.department_id = d.id;

-- 左外連接
SELECT e.name, d.department_name
FROM employees e LEFT OUTER JOIN departments d
ON e.department_id = d.id;

-- 右外連接
SELECT e.name, d.department_name
FROM employees e RIGHT OUTER JOIN departments d
ON e.department_id = d.id;

-- 全外連接
SELECT e.name, d.department_name
FROM employees e FULL OUTER JOIN departments d
ON e.department_id = d.id;

-- 交叉連接
SELECT e.name, d.department_name
FROM employees e CROSS JOIN departments d;

4.4 子查詢

-- WHERE子句中的子查詢
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- FROM子句中的子查詢
SELECT dept.avg_salary
FROM (
  SELECT department, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) dept
WHERE dept.avg_salary > 7000;

-- IN子查詢
SELECT name
FROM employees
WHERE department_id IN (
  SELECT id FROM departments WHERE location = 'New York'
);

5. Hive函數

5.1 內置函數

-- 數學函數
SELECT ABS(-10), ROUND(3.1415, 2), CEIL(3.14), FLOOR(3.14);

-- 字符串函數
SELECT 
  CONCAT('Hello', ' ', 'World'),
  SUBSTR('Hello World', 1, 5),
  TRIM('   Hello   '),
  UPPER('hello'),
  LOWER('WORLD');

-- 日期函數
SELECT 
  CURRENT_DATE(),
  YEAR('2023-10-15'),
  MONTH('2023-10-15'),
  DATEDIFF('2023-12-31', '2023-01-01');

-- 條件函數
SELECT 
  IF(salary > 5000, 'High', 'Low') AS salary_level,
  CASE 
    WHEN salary < 3000 THEN 'Low'
    WHEN salary BETWEEN 3000 AND 7000 THEN 'Medium'
    ELSE 'High'
  END AS salary_category
FROM employees;

5.2 窗口函數

-- ROW_NUMBER()
SELECT 
  name, 
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;

-- RANK()和DENSE_RANK()
SELECT 
  name,
  salary,
  RANK() OVER (ORDER BY salary DESC) AS rank,
  DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

-- 聚合窗口函數
SELECT 
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS avg_dept_salary,
  salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;

5.3 自定義函數

-- 創建臨時函數
CREATE TEMPORARY FUNCTION function_name AS 'class_name';

-- 示例:創建UDF
-- 1. 編寫Java類繼承UDF
-- 2. 打包為JAR
-- 3. 在Hive中注冊
ADD JAR /path/to/udf.jar;
CREATE TEMPORARY FUNCTION my_upper AS 'com.example.MyUpperUDF';

-- 使用UDF
SELECT my_upper(name) FROM employees;

-- 創建永久函數
CREATE FUNCTION database_name.function_name AS 'class_name'
USING JAR 'hdfs:///path/to/jar';

6. Hive分區與分桶

6.1 分區表

-- 創建分區表
CREATE TABLE employee_partitioned (
  id INT,
  name STRING,
  salary FLOAT
)
PARTITIONED BY (department STRING, year INT)
STORED AS ORC;

-- 加載數據到特定分區
LOAD DATA LOCAL INPATH '/path/to/data' 
INTO TABLE employee_partitioned
PARTITION (department='IT', year=2023);

-- 動態分區
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

INSERT INTO TABLE employee_partitioned
PARTITION (department, year)
SELECT id, name, salary, department, 2023 AS year
FROM employees;

-- 查看分區
SHOW PARTITIONS employee_partitioned;

-- 添加分區
ALTER TABLE employee_partitioned ADD PARTITION (department='HR', year=2023);

-- 刪除分區
ALTER TABLE employee_partitioned DROP PARTITION (department='HR', year=2023);

6.2 分桶表

-- 創建分桶表
CREATE TABLE employee_bucketed (
  id INT,
  name STRING,
  salary FLOAT,
  department STRING
)
CLUSTERED BY (department) INTO 4 BUCKETS
STORED AS ORC;

-- 啟用分桶
SET hive.enforce.bucketing = true;

-- 插入數據到分桶表
INSERT INTO TABLE employee_bucketed
SELECT * FROM employees;

-- 分桶抽樣查詢
SELECT * FROM employee_bucketed TABLESAMPLE(BUCKET 1 OUT OF 4 ON department);

7. Hive高級特性

7.1 視圖

-- 創建視圖
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...)]
[COMMENT view_comment]
[TBLPROPERTIES (property_name=property_value, ...)]
AS SELECT ...;

-- 示例
CREATE VIEW high_paid_employees AS
SELECT name, salary FROM employees WHERE salary > 8000;

-- 查詢視圖
SELECT * FROM high_paid_employees;

-- 刪除視圖
DROP VIEW [IF EXISTS] view_name;

7.2 索引

-- 創建索引
CREATE INDEX index_name
ON TABLE table_name (column_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (column_name, ...)]
[
   [ ROW FORMAT ...] STORED AS ...
   | STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"];

-- 重建索引
ALTER INDEX index_name ON table_name REBUILD;

-- 刪除索引
DROP INDEX [IF EXISTS] index_name ON table_name;

7.3 事務

-- 啟用事務支持
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

-- 創建支持ACID的表
CREATE TABLE transactional_table (
  id INT,
  name STRING
)
CLUSTERED BY (id) INTO 2 BUCKETS
STORED AS ORC
TBLPROPERTIES ('transactional'='true');

-- 事務操作
START TRANSACTION;
INSERT INTO transactional_table VALUES (1, 'Alice');
UPDATE transactional_table SET name = 'Bob' WHERE id = 1;
COMMIT;

8. 總結

本文詳細介紹了Hive的各種語法,包括: - 數據定義語言(DDL):數據庫和表的創建、修改、刪除 - 數據操作語言(DML):數據的加載、查詢和導出 - 豐富的查詢語法:基礎查詢、聚合函數、JOIN操作和子查詢 - 內置函數和自定義函數 - 分區與分桶技術 - 高級特性如視圖、索引和事務

Hive的語法與SQL非常相似,使得熟悉SQL的用戶能夠快速上手。同時,Hive也提供了許多特有的功能來優化大數據處理,如分區、分桶和MapReduce集成等。掌握這些語法對于高效使用Hive進行大數據分析至關重要。

隨著Hive的不斷發展,新的功能和語法也在不斷加入。建議讀者定期查閱官方文檔以獲取最新的語法特性。 “`

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女