Getting started with Apache Kafka®

To get started with the service:

Before you start

Your primary tool to interact with the DoubleCloud is the console. We need to set it up and then configure it before moving on.

  1. Go to the Clusters overview page in the console.

  2. If you already have an account, log in to DoubleCloud or sign up if you open the console for the first time.

    Warning

    The steps below show the sequence of setting up the kcat from Docker and with kafkacat on DEB-based but you can use other tools of your choice.

    For other connection options, see Connect to a Apache Kafka® cluster.

    Pull the kcat image available at Docker Hub. We use the 1.7.1 version, but you can use the latest one:

    docker pull edenhill/kcat:1.7.1
    

    Install kafkacat from your Linux distribution's repository:

    sudo apt install kafkacat
    

Create your cluster

A cluster in the Managed Service for Apache Kafka® is one or more broker hosts where topics and their partitions are located.

Warning

During the trial period, you can create clusters with up to 8 cores, 32 GB RAM, and 400 GB storage. If you need to raise the quotas, don't hesitate to contact our support.

  1. Go to the Clusters overview page in the console.

  2. Click Create cluster in the upper-right corner of the page.

    1. Select Apache Kafka®.

    2. Choose a provider:

      1. Under Resources:

        • Select the s1-c2-m4 preset for CPU, RAM capacity, and storage space to create a cluster with minimal configuration.

          Understand your Apache Kafka® resource preset

          A resource preset has the following structure:

          <CPU platform> - C<number of CPU cores> - M<number of gigabytes of RAM>
          

          There are three available CPU platforms:

          • g - ARM Graviton

          • i - Intel (x86)

          • s - AMD (x86)

          For example, the i1-c2-m8 preset means that it's an Intel platform 2-core CPU with 8 gigabytes of RAM.

          You can see the availability of CPU platforms across our Managed Service for Apache Kafka® areas and regions.

        • Select the number of zones and brokers. In this tutorial, we create a cluster with 1 zone and 1 broker.

      2. Under Basic settings:

        • Enter the cluster Name, for example, quickstart-cluster.

        • From the Version drop-down list, select the Apache Kafka® version the cluster will use. For most clusters, we recommend using the latest version.

      3. Under NetworkingVPC, specify in which DoubleCloud VPC to locate your cluster. Use the default value in the previously selected region if you don't need to create this cluster in a specific network.

      4. Click Submit.

      1. Under Resources:

      2. Under Basic settings:

        • Enter the cluster Name, for example, quickstart-cluster.

        • From the Version drop-down list, select the Apache Kafka® version the cluster will use. For most clusters, we recommend using the latest version.

      3. Under NetworkingVPC, specify in which DoubleCloud VPC to locate your cluster. Use the default value in the previously selected region if you don't need to create this cluster in a specific network.

      4. Click Submit.

Your cluster will appear with the Creating status on the Clusters page in the console. Setting everything up may take some time. When the cluster is ready, it changes its state to Alive.

Click on the cluster and you'll see the following page:

cluster-created

To create a Apache Kafka® cluster, use the ClusterService create method. The required parameters to create a functional cluster:

  • project_id - the ID of your project. You can get this value on your project's information page.

  • cloud_type - aws or gcp.

  • region_id - for this quickstart, use eu-central-1 for AWS or europe-west3 for GCP.

  • name - quickstart-cluster.

  • resources - specify the following from the doublecloud.ckafka.v1.Cluster model:

    • resource_preset_id - for this quickstart, specify s1-c2-m4.

    • disk_size - 34359738368 bytes (32 GB).

    • broker_count - 1.

    • zone_count - 3.

  • You can also enable schema registry for your cluster: use the schema_registry_config object within the ClusterService Create method.

Kafka cluster request in JSON format
{
   "project_id": "<your Project ID>",
   "cloud_type": "aws",
   "region_id": "eu-central-1",
   "name": "quickstart-cluster",
   "resources": {
      "kafka": {
            "resource_preset_id": "s1-c2-m4",
            "disk_size": "34359738368",
            "broker_count": 1,
            "zone_count": 3
      }
   },
}

Note

The DoubleCloud service creates the superuser admin and its password automatically. You can find both the User and the Password in the Overview tab on the cluster information page.

Create a topic

After you've created a cluster, you also need to create a topic for messages:

  1. On the cluster's page, go to the Topics tab.

  2. Click Create.

  3. Under Topic Settings, specify the connection properties:

    • Cleanup policy - Delete. This policy deletes log segments when their retention time or log size reaches the limit.

    • Compression Type - Uncompressed. We don't need compression for this tutorial. Let's disable it.

    • Retention Bytes - 1048576 (1 Mb).

    • Retention Ms - 600000 (10 minutes).

  4. Specify the Basic Settings:

    • Name

      A topic's name. Let's call it first-topic.

    • Partitions

      The number of a topic's partitions . Set to 1 to create the simplest topic.

    • Replication factor

      Specifies the number of copies of a topic in a cluster. This parameter's value shouldn't exceed the overall number of brokers in the cluster. Set it to 1.

    Your topic should look as follows:

    topic-configured

  5. Click Submit.

Use the TopicService create method and pass the following parameters:

  • cluster_id - the ID of the cluster in which you want to create a topic. To find the cluster ID, get a list of clusters in the project.

  • topic_spec - let's configure the required topic specifications:

    • name - specify the topic name, first-topic.

    • partitions - set the minimum number of partitions for this quickstart, 1.

    • replication_factor - go for the basic option here as well, specify 1.

    • topic_config_3: use the doublecloud.kafka.v1.TopicConfig3 model to set the further topic configuration for Apache Kafka® version 3 and above:

      • cleanup policy - set the cleanup policy for the topic, in this case CLEANUP_POLICY_DELETE.

      • compression_type - we don't need compression for this tutorial, specify COMPRESSION_TYPE_UNCOMPRESSED.

      • retention_bytes - 1048576 (1 Mb).

      • retention_ms - 600000 (10 minutes).

Connect to your cluster

When you have a cluster and a topic in it, connect to the cluster and transfer a text message between a consumer and a producer:

  1. Run a command that contains a connection string to create a consumer. You can use the Connection string from the Overview tab on your cluster information page. The command has the following structure:

    docker run --name kcat --rm -i -t edenhill/kcat:1.7.1 
          -C \
          -b <broker FQDN>:9091 \
          -t <topic name> \
          -X security.protocol=SASL_SSL \
          -X sasl.mechanisms=SCRAM-SHA-512 \
          -X sasl.username="admin" \
          -X sasl.password="<cluster password>" \
          -Z
    
    kafkacat -C \
          -b <broker FQDN>:9091 \
          -t <topic name> \
          -X security.protocol=SASL_SSL \
          -X sasl.mechanisms=SCRAM-SHA-512 \
          -X sasl.username="admin" \
          -X sasl.password="<cluster password>" \
          -Z
    

    You will see the following status message:

    % Reached end of topic first-topic [0] at offset 0
    
  2. Execute the following command in a separate terminal instance to create a producer and push the data:

    curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.json | docker run --name kcat --rm -i edenhill/kcat:1.7.1
          -P \
          -b <broker FQDN>:9091 \
          -t <topic name> \
          -k key \
          -X security.protocol=SASL_SSL \
          -X sasl.mechanisms=SCRAM-SHA-512 \
          -X sasl.username="<username>" \
          -X sasl.password="<password>"
    
    curl https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/hits_sample.json | kafkacat
          -P \
          -b <broker FQDN>:9091 \
          -t <topic name> \
          -k key \
          -X security.protocol=SASL_SSL \
          -X sasl.mechanisms=SCRAM-SHA-512 \
          -X sasl.username="<username>" \
          -X sasl.password="<password>"
    
  3. If you've completed all the steps successfully, the terminal with the consumer will show the uploaded data:

     {
          "Hit_ID": 40668,
          "Date": "2017-09-09",
          "Time_Spent": "730.875",
          "Cookie_Enabled": 0,
          "Redion_ID": 11,
          "Gender": "Female",
          "Browser": "Chrome",
          "Traffic_Source": "Social network",
          "Technology": "PC (Windows)"
     }
    % Reached end of topic first-topic [0] at offset 1102
    

Now you have an Apache Kafka® cluster with the working consumer and producer. See the links below to continue exploring:

See also