Create a dataset
-
Click Create → Dataset in the upper-right corner of your workbook page.
-
In the Connections section, click + Add and select one of your connections from the list. You will see a list of tables that the Visualization service can visualize in the Tables section.
By default, the table from the connection will automatically appear in the dataset's workspace.
You'll see the preview of the dataset columns in the Preview section below.
-
(optional) You can drag and drop additional the tables to the workspace on the right to create links between them.
-
Click Save in the upper-right corner of your dataset page.
Note
This configuration is a segment in the overall description of Visualization assets in the workbook. For the complete tutorial, see Use Visualization API.
To create a dataset, use the Dataset configuration:
-
name
- your dataset's name. Must be unique within the workbook. -
sources
- the data sources for the dataset:-
id
- the ID of the connection you want to use to ingest data into the dataset. -
title
- specify the connection title. -
connection_ref
- specify the connection name. -
spec
- describe the specification using the Connection configuration: -
avatars
- (optional) Avatar configuration is analogous toFROM
section in SQL.For a single data source, you can send the
null
value. In this case, an avatar will be created automatically and its ID will be equal to data source ID. -
fields
- describe every dataset field.Tip
For multi-field datasets, this description can become excessively labor-intensive. To save time and effort, use the AdviseDatasetFieldsRequest to get the list of fields at your data source:
AdviseDatasetFieldsRequest API example
kind: advise_dataset_fields connection_ref: kind: id_ref id: '%my_connection_id%' partial_dataset: fields: [] avatars: null sources: - id: main title: <your_connection_title> connection_ref: <your_connection_name> # this string doesn't affect the request spec: kind: clickhouse raw_sql_level: subselect cache_ttl_sec: null host: <your_ClickHouse_host> port: 443 username: user secure: true
AdviseDatasetFields Python example
from doublecloud.visualization.v1.workbook_pb2 import Dataset from doublecloud.visualization.v1.workbook_service_pb2 import AdviseDatasetFieldsRequest def advise_dataset_fields(svc, workbook_id: str, sources: list, connection_name: str): """ Function helps to automatically define all fields, their names/IDs and types based on underlying data source (table, view, SQL query, etc.). ID of fields will be equal to column names. You can define them manually or use this handler to simplify for popular cases """ dataset = Dataset() dataset.config.struct_value.update( { "fields": [], "avatars": None, "sources": sources, } ) return svc.AdviseDatasetFields( AdviseDatasetFieldsRequest( workbook_id=workbook_id, connection_name=connection_name, partial_dataset=dataset, ) )
In response, you'll get the complete description of all the data fields at the source in the required format.
-
API example
- name: ds_sales
dataset:
sources:
- id: main
title: PostreSQL
connection_ref: my_postgres_connection
spec:
kind: postgres
raw_sql_level: off
cache_ttl_sec: null
host: example.com
port: 443
username: user
database_name: my_db_name
avatars: null
fields:
- title: category
id: category
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: category
- title: customer_id
id: customer_id
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: customer_id
- title: date
id: date
cast: date
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: date
- title: order_id
id: order_id
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: order_id
- title: postal_code
id: postal_code
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: postal_code
- title: profit
id: profit
cast: float
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: profit
- title: region
id: region
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: region
- title: sales
id: sales
cast: float
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: sales
- title: segment
id: segment
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: segment
- title: sub_category
id: sub_category
cast: string
description: null
hidden: false
aggregation: none
calc_spec:
kind: direct
avatar_id: main
field_name: sub_category
- title: The sum_sales
id: sum_sales
cast: integer
description: null
hidden: false
aggregation: none
calc_spec:
kind: id_formula
formula: SUM([sales])