Hive LATERAL VIEW OUTER EXPLODE explained with code examples
Hive is a powerful data processing tool that allows users to perform analytics on large datasets stored in distributed file systems. One of the most useful features of Hive is the ability to explode arrays and maps using the LATERAL VIEW OUTER EXPLODE clause. In this article, we will explore how this clause works and provide code examples to illustrate its usage.
Introduction to LATERAL VIEW
In Hive, the LATERAL VIEW clause is used to perform a lateral view of a table or a subquery. It allows you to create a new table by joining each row of the original table with the output of a user-defined table-generating function (UDTF). The LATERAL VIEW OUTER EXPLODE clause is a specific form of the LATERAL VIEW clause that is used to explode an array or a map column into multiple rows.
Exploding Arrays
Arrays are a common data structure used to store a collection of elements. In Hive, you can explode an array column using the LATERAL VIEW OUTER EXPLODE clause. Let's consider a table called "students" with two columns: "name" and "subjects" where "subjects" is an array column containing the subjects that each student is enrolled in.
CREATE TABLE students (
name STRING,
subjects ARRAY<STRING>
);
INSERT INTO students VALUES
('Alice', array('Math', 'Science')),
('Bob', array('English', 'History')),
('Charlie', array('Physics', 'Chemistry', 'Biology'));
To explode the "subjects" array column, we can use the following query:
SELECT name, subject
FROM students
LATERAL VIEW OUTER EXPLODE(subjects) subjectsTable AS subject;
The output of this query will be a table with two columns: "name" and "subject", where each row represents a student and the subject they are enrolled in. For example:
name | subject |
---|---|
Alice | Math |
Alice | Science |
Bob | English |
Bob | History |
Charlie | Physics |
Charlie | Chemistry |
Charlie | Biology |
Exploding Maps
Maps are another commonly used data structure that store key-value pairs. In Hive, you can explode a map column using the LATERAL VIEW OUTER EXPLODE clause. Let's consider a table called "scores" with two columns: "name" and "marks" where "marks" is a map column containing the subject and the corresponding marks for each student.
CREATE TABLE scores (
name STRING,
marks MAP<STRING, INT>
);
INSERT INTO scores VALUES
('Alice', map('Math', 95, 'Science', 85)),
('Bob', map('English', 80, 'History', 90)),
('Charlie', map('Physics', 75, 'Chemistry', 85, 'Biology', 90));
To explode the "marks" map column, we can use the following query:
SELECT name, subject, marks
FROM scores
LATERAL VIEW OUTER EXPLODE(marks) marksTable AS subject, marks;
The output of this query will be a table with three columns: "name", "subject", and "marks", where each row represents a student, the subject they have marks for, and the corresponding marks. For example:
name | subject | marks |
---|---|---|
Alice | Math | 95 |
Alice | Science | 85 |
Bob | English | 80 |
Bob | History | 90 |
Charlie | Physics | 75 |
Charlie | Chemistry | 85 |
Charlie | Biology | 90 |
Using LATERAL VIEW OUTER EXPLODE with UDTFs
In addition to exploding arrays and maps, you can also use LATERAL VIEW OUTER EXPLODE with user-defined table-generating functions (UDTFs). UDTFs allow you to create a new table by applying a function to each row of the original table. Let's consider a UDTF called "split_words" that takes a string column and splits it into multiple rows based on a delimiter.
CREATE FUNCTION split_words AS 'SplitWordsUDTF'
USING JAR 'split_words.jar';
CREATE TABLE text (
id INT,
content STRING
);
INSERT INTO text VALUES
(1, 'Hello, world!'),
(2, 'Hive is awesome.');
SELECT id, word
FROM text
LATERAL VIEW OUTER EXPLODE(split_words(content, ',')) splitTable AS word;
In this example, the UDTF "split_words" takes the "content" column of the "text" table and splits it into multiple rows based on the delimiter ','. The resulting table will have two columns: "id" and "word", where each row represents a word extracted from the "content" column along with the corresponding "id". For example:
id | word |
---|---|
1 | Hello |
1 | world! |