Hive Data Types | Manan Younas

Hadoop Data Types Overview

Category	Data Type	Description
Primitive	TINYINT	1-byte signed integer
Primitive	SMALLINT	2-byte signed integer
Primitive	INT	4-byte signed integer
Primitive	BIGINT	8-byte signed integer
Primitive	FLOAT	Single precision floating point
Primitive	DOUBLE	Double precision floating point
Primitive	BOOLEAN	True/False values
Primitive	STRING	Character sequence
Primitive	TIMESTAMP	Date and time values
Primitive	BINARY	Byte sequence
Complex	ARRAY	Ordered collection of fields
Complex	MAP	Key-value pairs
Complex	STRUCT	Container of named fields
Complex	UNIONTYPE	Different types in same field

Additional Notes:

Primitive types are basic data types that cannot be broken down further
Complex types are composed of multiple primitive types or other complex types
All numeric types are signed
STRING type has no length limit

Complex Data Types in Hadoop

ARRAY

Arrays in Hadoop are ordered collections that can hold elements of the same data type. They are similar to arrays in programming languages, allowing indexing and iteration.

graph LR
    A[Array: scores] --> B[Index 0: 85]
    A --> C[Index 1: 92]
    A --> D[Index 2: 78]
    style A fill:#f9f,stroke:#333
    style B fill:#bbf,stroke:#333
    style C fill:#bbf,stroke:#333
    style D fill:#bbf,stroke:#333

MAP

Maps are collections of key-value pairs where each key must be unique. They're useful for storing related data pairs like configurations or attributes.

graph LR
    M[Map: attributes] --> D1[department]
    M --> L1[location]
    D1 --> V1[IT]
    L1 --> V2[NYC]
    style M fill:#f9f,stroke:#333
    style D1 fill:#bbf,stroke:#333
    style L1 fill:#bbf,stroke:#333
    style V1 fill:#dfd,stroke:#333
    style V2 fill:#dfd,stroke:#333

STRUCT

Structs are containers that can hold named fields of different data types. They're similar to objects in programming languages and useful for organizing related fields.

Here's a practical example of how STRUCT stores data in a nested structure:

CREATE TABLE employees (
    id INT,
    info STRUCT<
        personal:STRUCT<
            name:STRING,
            age:INT,
            email:STRING
        >,
        work:STRUCT<
            department:STRING,
            position:STRING,
            salary:DOUBLE
        >
    >
);

-- Example data
{
    "id": 101,
    "info": {
        "personal": {
            "name": "John Smith",
            "age": 30,
            "email": "john.smith@company.com"
        },
        "work": {
            "department": "Engineering",
            "position": "Senior Developer",
            "salary": 95000.00
        }
    }
}

Here's a visual representation using Mermaid diagram syntax:

graph TD
    E[employees] --> ID[id: INT]
    E --> INFO[info: STRUCT]
    INFO --> P[personal: STRUCT]
    INFO --> W[work: STRUCT]
    
    P --> P1[name: STRING]
    P --> P2[age: INT]
    P --> P3[email: STRING]
    
    W --> W1[department: STRING]
    W --> W2[position: STRING]
    W --> W3[salary: DOUBLE]
    
    P1 --> PV1["John Smith"]
    P2 --> PV2[30]
    P3 --> PV3["john.smith@company.com"]
    
    W1 --> WV1["Engineering"]
    W2 --> WV2["Senior Developer"]
    W3 --> WV3[95000.00]
    
    style E fill:#f9f,stroke:#333
    style INFO fill:#bbf,stroke:#333
    style P fill:#dfd,stroke:#333
    style W fill:#dfd,stroke:#333

In this example, the STRUCT type creates a hierarchical structure where related data is organized in logical groups. The outer STRUCT 'info' contains two inner STRUCTs: 'personal' and 'work', each containing specific attributes about an employee. This organization makes it easier to manage and query related data fields together.