Hive中的列轉行(Pivot)和行轉列(Unpivot)是兩種不同的數據轉換操作,它們在處理數據時有著不同的目的和效果。
CASE語句結合GROUP BY和聚合函數(如SUM、AVG等)來實現列轉行的操作。這種轉換通常用于將數據從長格式轉換為寬格式,使得數據分析更加方便。示例:
假設有一個名為sales_data的表,其中包含以下列:product_id、date和revenue。我們想要將date列的值轉換為多個新的列,例如year、month和day,并將這些新列的值匯總為每個product_id的總revenue??梢允褂靡韵翲ive SQL語句實現:
SELECT product_id,
SUM(CASE WHEN YEAR(date) = 2021 THEN revenue ELSE 0 END) AS revenue_2021,
SUM(CASE WHEN YEAR(date) = 2022 THEN revenue ELSE 0 END) AS revenue_2022,
SUM(CASE WHEN YEAR(date) = 2023 THEN revenue ELSE 0 END) AS revenue_2023
FROM sales_data
GROUP BY product_id;
LATERAL VIEW和EXPLODE函數來實現行轉列的操作。這種轉換通常用于將數據從寬格式轉換為長格式,以便進行更詳細的數據分析。示例:
假設有一個名為employee_data的表,其中包含以下列:employee_id、department和salary。我們想要將每個員工的工資信息轉換為多個新的列,例如year、month和day,并將這些新列的值匯總為每個員工的總收入??梢允褂靡韵翲ive SQL語句實現:
SELECT employee_id,
SUM(CASE WHEN YEAR(salary_date) = 2021 THEN salary ELSE 0 END) AS salary_2021,
SUM(CASE WHEN YEAR(salary_date) = 2022 THEN salary ELSE 0 END) AS salary_2022,
SUM(CASE WHEN YEAR(salary_date) = 2023 THEN salary ELSE 0 END) AS salary_2023
FROM employee_data
LATERAL VIEW INLINE(EXPLODE(ARRAY(
STRUCT('2021', salary_date, salary),
STRUCT('2022', salary_date, salary),
STRUCT('2023', salary_date, salary)
))) t AS year, month, salary
GROUP BY employee_id;
總結: