Introducing the Apache Pinot Terraform Provider

Introducing the Apache Pinot Terraform Provider

Introduction:

As technology evolves, so does the need for efficient, reliable and consistent infrastructure. In this post, I will be introducing the Apache Pinot Terraform provider - a powerful tool for streamlining the process of managing your Apache Pinot cluster configuration.

Over the past few months, I've had the pleasure of collaborating with my friend Hagen and other fantastic contributors. In this post, I will share our experience and insights on what makes the Apache Pinot Terraform Provider a game-changer!

What is Apache Pinot?

Apache Pinot is a column-oriented, open-source, distributed data store. Pinot is designed to execute OLAP queries with low latency. Apache Pinot was originally developed at LinkedIn for building real-time visualisations and analytics products with massive-scale and fast moving datasets. It is frequently used for providing real-time data exploration and visualisations. Its main key benefits include:

  • real-time analytics with low latency
  • scalability, allowing users to process massive volumes of data effectively and quickly
  • flexibility, support for multiple data formats, including CSV, JSON, Avro, Parquet and more.

Why create a Terraform Provider?

Before creating the Terraform provider, the configuration of Schema, Tables, and other Objects within the Pinot Controller would need to be manually created via ClickOps, or to be created via a customised API script communicating with the Controller API.

Terraform allows for a familiar and standardised manner to implement Infrastructure as Code, this provides several advantages for Engineers working with and configuring Apache Pinot including:

Simplifying infrastructure management:

Terraform has stood out as the go-to choice for managing Infrastructure in a code-driven manner, and has proven to simplify the management of infrastructure and provides the ability to streamline the configuration, deployment and scaling processes.

Easing adoption and Integration:

By creating an official Terraform provider, it makes it easier for more users to adopt Apache pinot and to integrate it into their existing tech stacks.

Enhancing Automation Capabilities:

Utilising Terraform's automation features can improve the efficiency of managing Apache Pinot deployments and configurations across clusters.

Getting started:

For an example, I will outline the entire process of setting up a Schema with the Pinot Terraform Provider.

Assumptions:

  • You have the Terraform CLI installed on your machine, if not, you can follow these steps: Install Terraform
  • You have a running instance of Apache Pinot, there are various ways to spin up instances of Pinot, but I will use the Batch Quickstart with Docker from this page: link
  • I will be using VS Code for editing the files.

Steps:

  1. Create a directory to store you Terraform files:
    mkdir apache-pinot-terraform
  2. Open the directory in your text editor, and create a main.tf file in the directory:
  1. We will need to first define the required providers, which for this case will only be pinot, add the following to main.tf:
terraform {
required_providers {
  pinot = {
    source = "azaurus1/pinot"
    version = "0.7.1"
  }
}
}
  1. Now save main.tf, and in a terminal, run terraform init, this will download the provider, and create a .terraform directory along with a .terraform.lock.hcl file.
  2. Next, we will add some provider specific config:
provider "pinot" {
 controller_url = "http://localhost:9000" 
 auth_token = "NOT_NEEDED"
}

controller_url will be the address and port of your Apache Pinot Controller instance, auth_token is required, but if Authentication is not enabled, you can provide any string, If you have Authentication enabled, you can also provide:

auth_token     = "YWRtaW46dmVyeXNlY3JldA"
auth_type      = "bearer"

The above is optional, and if not entered, the provider will default to basic for auth_type, bearer allows for interacting with managed Apache Pinot instances that require Bearer tokens for accessing the controller.
6. Next, we'll add our Schema definition to main.tf:

resource "pinot_schema" "block_schema" {
  schema_name = "ethereum_block_headers"
  date_time_field_specs = [{
    data_type   = "LONG",
    name        = "block_timestamp",
    format      = "1:MILLISECONDS:EPOCH",
    granularity = "1:MILLISECONDS",
  }]
  dimension_field_specs = [{
    name      = "block_number",
    data_type = "INT",
    not_null  = true
    },
    {
      name      = "block_hash",
      data_type = "STRING",
      not_null  = true
  }]
  metric_field_specs = [{
    name      = "block_difficulty",
    data_type = "INT",
    not_null  = true
  }]
}

This defines a Terraform resource that will create a schema with the name ethereum_block_headers, which is a required resource for creating a table.
7. Now we will run terraform plan this will create an execution plan an lets you preview what changes will be made:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # pinot_schema.block_schema will be created
  + resource "pinot_schema" "block_schema" {
      + date_time_field_specs = [
          + {
              + data_type   = "LONG"
              + format      = "1:MILLISECONDS:EPOCH"
              + granularity = "1:MILLISECONDS"
              + name        = "block_timestamp"
            },
        ]
      + dimension_field_specs = [
          + {
              + data_type = "INT"
              + name      = "block_number"
              + not_null  = true
            },
          + {
              + data_type = "STRING"
              + name      = "block_hash"
              + not_null  = true
            },
        ]
      + metric_field_specs    = [
          + {
              + data_type = "INT"
              + name      = "block_difficulty"
              + not_null  = true
            },
        ]
      + schema_name           = "ethereum_block_headers"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Assuming that everything was set up you will see:

pinot_schema.block_schema: Creating...
pinot_schema.block_schema: Creation complete after 1s

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
  1. Once we've reviewed the plan and are happy with the changes we can run terraform apply, this will show you the plan again and prompt you for approval to make the changes, and we will enter yes to make the changes:
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
  1. Assuming that everything was set up you will see:
pinot_schema.block_schema: Creating...
pinot_schema.block_schema: Creation complete after 1s

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

The above message indicates that there were no issues encountered, and the schema has been created, we can find this in the Apache Pinot Zookeeper browser here:

Features and Benefits:

There are 3 main benefits to using the Apache Pinot Terraform Provider:

  1. Infrastructure Automation: You can utilise Terraform's powerful Infrastructure as Code capabilities to automate the setup, configuration and deployment of Apache Pinot
  2. Simplify the management of Apache Pinot Clusters: Using the provider makes it easy to Create, Update or Delete configured Apache clusters within your own Infrastructure
  3. Scalability Support: The Apache Pinot Terraform provider allows you to easily configure new Pinot nodes in response to changing demand and utilisation.

Use Cases:

As of the time of writing, there are 4 main use cases for the Apache Pinot Terraform provider, which are:

  1. Managing Users
  2. Managing Schemas
  3. Managing Tables
  4. Reading various objects from the Controller as Terraform data sources (Cluster info, Users, Instances, Segments, etc..)

Future Plans and Improvements:

Future Improvements (at time of writing) will mainly include more Terraform resources, which include:

  • Tenants
  • Cluster info
  • Instances
  • Segments

If you have any requests for improvements or features, you can add them as an issue to the GitHub repo: Repo

Conclusion:

In conclusion, the Apache Pinot Terraform Provider offers a solution for managing you data infrastructure efficiently, by using the provider you will be well on your way to unlocking new levels of reliability, consistency and scalability.

Finally, we'd like to emphasise that our community is open, welcoming contributions from everyone interested in improving the Apache Pinot Terraform Provider. Whether you're a seasoned developer or just starting, your ideas and feedback are invaluable. You can find the source code on our GitHub repository. We welcome you to explore and contribute!

To learn more about the Apache Pinot Terraform Provider and its available resources, visit our provider's Terraform Registry page.