Create a dataset

Console

API

Click Create → Dataset in the upper-right corner of your workbook page.
In the Connections section, click + Add and select one of your connections from the list. You will see a list of tables that the Visualization service can visualize in the Tables section.

By default, the table from the connection will automatically appear in the dataset's workspace.

You'll see the preview of the dataset columns in the Preview section below.
(optional) You can drag and drop additional the tables to the workspace on the right to create links between them.
Click Save in the upper-right corner of your dataset page.

Note

This configuration is a segment in the overall description of Visualization assets in the workbook. For the complete tutorial, see Use Visualization API.

To create a dataset, use the Dataset configuration:

name - your dataset's name. Must be unique within the workbook.

sources - the data sources for the dataset:

id - the ID of the connection you want to use to ingest data into the dataset.
title - specify the connection title.
connection_ref - specify the connection name.
spec - describe the specification using the Connection configuration:
avatars - (optional) Avatar configuration is analogous to FROM section in SQL.

For a single data source, you can send the null value. In this case, an avatar will be created automatically and its ID will be equal to data source ID.

fields - describe every dataset field.

Tip

For multi-field datasets, this description can become excessively labor-intensive. To save time and effort, use the AdviseDatasetFieldsRequest to get the list of fields at your data source:

AdviseDatasetFieldsRequest API example

kind: advise_dataset_fields
connection_ref:
   kind: id_ref
   id: '%my_connection_id%'
partial_dataset:
   fields: []
   avatars: null
   sources:
   - id: main
      title: <your_connection_title>
      connection_ref: <your_connection_name> # this string doesn't affect the request
      spec:
         kind: clickhouse
         raw_sql_level: subselect
         cache_ttl_sec: null
         host: <your_ClickHouse_host>
         port: 443
         username: user
         secure: true

AdviseDatasetFields Python example

View this example on GitHub

from doublecloud.visualization.v1.workbook_pb2 import Dataset
from doublecloud.visualization.v1.workbook_service_pb2 import AdviseDatasetFieldsRequest


def advise_dataset_fields(svc, workbook_id: str, sources: list, connection_name: str):
   """
   Function helps to automatically define all fields, their names/IDs and types
   based on underlying data source (table, view, SQL query, etc.).
   ID of fields will be equal to column names.
   You can define them manually or use this handler to simplify for popular cases
   """
   dataset = Dataset()
   dataset.config.struct_value.update(
      {
            "fields": [],
            "avatars": None,
            "sources": sources,
      }
   )

   return svc.AdviseDatasetFields(
      AdviseDatasetFieldsRequest(
            workbook_id=workbook_id,
            connection_name=connection_name,
            partial_dataset=dataset,
      )
   )

In response, you'll get the complete description of all the data fields at the source in the required format.

API example

- name: ds_sales
dataset:
   sources:
   - id: main
      title: PostreSQL
      connection_ref: my_postgres_connection
      spec:
         kind: postgres
         raw_sql_level: off
         cache_ttl_sec: null
         host: example.com
         port: 443
         username: user
         database_name: my_db_name
   avatars: null
   fields:
   - title: category
      id: category
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: category
   - title: customer_id
      id: customer_id
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: customer_id
   - title: date
      id: date
      cast: date
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: date
   - title: order_id
      id: order_id
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: order_id
   - title: postal_code
      id: postal_code
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: postal_code
   - title: profit
      id: profit
      cast: float
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: profit
   - title: region
      id: region
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: region
   - title: sales
      id: sales
      cast: float
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: sales
   - title: segment
      id: segment
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: segment
   - title: sub_category
      id: sub_category
      cast: string
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: direct
      avatar_id: main
      field_name: sub_category
   - title: The sum_sales
      id: sum_sales
      cast: integer
      description: null
      hidden: false
      aggregation: none
      calc_spec:
      kind: id_formula
      formula: SUM([sales])

Create a dataset

See also