Upload CSV to your Managed ClickHouse® cluster
What these instructions are about
This article describes how to upload CSV files to your Managed ClickHouse® cluster with:
-
Sample connection scripts:
Before you start, download
Upload with GUI-based IDEs
We tested these scenarios in the following environments:
-
Windows 10 Enterprise 1909:
- JetBrains DataGrip:
2021.2
- DBeaver Community:
21.2.0
- JetBrains DataGrip:
JetBrains DataGrip natively supports importing CSV files into ClickHouse®. You can accomplish the whole process without any third-party plugins or extensions:
-
In the Database pane on the left of the main workspace window, right-click the database schema where you want to upload your CSV table.
-
Click Import data from file...
-
Locate the file you want to upload and click OK.
-
In the Import file window, the application will set the settings automatically. Review them, adjust if necessary and click Import.
-
Your table will appear in the Tables folder.
DBeaver Database Manager natively supports importing CSV files into ClickHouse®. You can accomplish the whole process without any third-party plugins or extensions:
-
Right-click the name of the ClickHouse® database where you want to import data.
-
Choose Import Data from the menu. You will see a Data transfer wizard.
-
Make sure to select CSV at the Import source step and click Next.
-
Locate the CSV file you want to import and click Open.
-
Make sure the filename is correct and click Next.
-
Ensure that the database name in the Target container field is correct and click Next.
-
Default data load settings are sufficient for most cases. Click Next.
Look at the settings. If you need to change any of them, return to any process step by clicking on the corresponding line in the left pane wizard's left pane.
-
Click Proceed to import the CSV table.
clickhouse-client
We tested these instructions in the following environments:
- Ubuntu 20.04 LTS:
- Bash:
5.1.16
- clickhouse-client:
22.7.3.5
- Docker:
20.10.12
- Bash:
Before you start
-
If you haven't already, create a Managed ClickHouse® cluster.
-
Download
-
Install the ClickHouse® client.
Upload your data
Uploading a CSV file with the native clickhouse-client application is a sequence of manual steps. This gives you complete control over every step of the process without applying any automation to it.
In short, you need to complete the following steps:
-
Connect to your cluster.
-
Create a table in the cluster that follows the CSV file structure you want to import.
-
Disconnect from the cluster.
-
Use the bash script to upload the contents of your CSV file into your Managed ClickHouse® cluster.
Let's follow every step in detail.
-
Connect to your cluster with a connection string and create a table to import your CSV data to:
-
Go to the Clusters
-
Select the name of the cluster you want to connect to.
-
Under Connection strings, find the Native interface string and click Copy.
-
Open your terminal and run a command to connect to your cluster:
DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string>
The complete Docker command structure
docker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <cluster user name> \ --password <cluster user password> \ --port 9440
<Native interface connection string>
-
-
Create a table to import your CSV data to.
The table must have the same structure as your CSV file. For this scenario, create a simple table containing performance statistics for a group of sales representatives. Let's call it
region_sales
:CREATE TABLE region_sales ON CLUSTER default ( `order_date` Date, `city` String, `district` String, `borough` String, `rep` String, `item` String, `sold` Int, `price` Float, `total` Float ) ENGINE = ReplicatedMergeTree() ORDER BY (order_date)
-
Exit the cluster command interface by typing
exit
to access local files in the next step. -
Import the data from the CSV file you downloaded earlier into the table on your cluster. You need to specify the path to your CSV file, connect to the cluster and perform an
INSERT
query. To do this, type the following sequence of commands in your terminal:Native clickhouse-clientDockercat <path to sample_data.csv> | <Native Interface connection string> \ --query="INSERT INTO region_sales FORMAT CSV"
cat <path to sample_data.csv> | docker run --network host --rm -i \ clickhouse/<Native interface connection string> \ --query="INSERT INTO region_sales FORMAT CSV"
The complete Docker command structure
cat <path to sample_data.csv> | docker run --network host --rm -i \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <your cluster username> \ --password <Cluster user password> \ --port 9440 --query="INSERT INTO region_sales FORMAT CSV"
You should see the following output:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3189 100 3189 0 0 7532 0 --:--:-- --:--:-- --:--:-- 7556
-
Check your imported data:
Native clickhouse-clientDocker<Native interface connection string> --query="SELECT * FROM region_sales LIMIT 10"
cat <path to sample_data.csv> | docker run --network host --rm -it \ clickhouse/<Native interface connection string> \ --query="SELECT * FROM region_sales LIMIT 10"
The complete Docker command structure
docker run --network host --rm -i \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <your cluster username> \ --password <Cluster user password> \ --port 9440
You should see the data from your CSV file in the terminal output:
2075-01-23 Night City City Center Corporate Plaza Kivell Binder 50 19.99 999.5 2075-02-26 Night City City Center Corporate Plaza Gill Pen 27 19.99 539.73 2075-03-15 Night City Heywood Japantown Sorvino Pencil 56 2.99 167.44 2075-04-01 Night City Watson Kabuki Jones Binder 60 4.99 299.4 2075-04-18 Night City City Center Corporate Plaza Andrews Pencil 75 1.99 149.25 2075-05-05 Night City City Center Corporate Plaza Jardine Pencil 90 4.99 449.1 2075-05-22 Night City Heywood Japantown Thompson Pencil 32 1.99 63.68 2075-06-01 Night City Watson Kabuki Jones Pencil 95 1.99 189.05 2075-06-08 Night City Watson Kabuki Jones Binder 60 8.99 539.4 2075-06-25 Night City City Center Corporate Plaza Morgan Pencil 90 4.99 449.1
Amazon S3 URL
Uploading a CSV file in the Amazon S3 bucket gives you extra flexibility with DoubleCloud Managed Service for ClickHouse®. It's a sequence of manual steps that gives you complete control over every process step but lacks automation.
In short, you need to complete the following steps:
- Connect to your cluster.
- Create a table in the cluster that follows the CSV file structure you want to import.
- Use the bash script to upload the contents of your CSV file from an Amazon S3 bucket into your Managed ClickHouse® cluster.
Before you start this scenario, it's essential to familiarize yourself with path-style URLs used by Amazon S3:
https://<Bucket-name>.s3.<Region>.amazonaws.com/<Key_name>
Region
is the identifier of the AWS region.Bucket-name
is the name of the bucket that contains the file.Key-name
is the name of the file to which you want to give a link.
For example, there's a bucket named doublecloud-docs
located in Frankfurt(eu-central-1
), and we want to access the sample_data.csv
file in the /data-sets
folder. The path-style URL will be as follows:
https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sample_data.csv
Now let's follow every step in detail.
-
Install the ClickHouse® client.
-
Connect to your cluster with a connection string:
-
Go to the Clusters
-
Under Connection strings, find the Native interface string and click Copy.
-
Open your terminal and connect to the cluster using the ClickHouse® client.
-
-
Create a table to import your CSV data to. The table must have the same structure as your CSV file. For this tutorial, create a simple table containing performance statistics for a group of sales representatives. Call it
nc_sales
:CREATE TABLE nc_sales ON CLUSTER default ( `order_date` Date, `city` String, `district` String, `borough` String, `rep` String, `item` String, `sold` Int, `price` Float, `total` Float ) ENGINE = ReplicatedMergeTree() ORDER BY (order_date)
-
Insert the data from your CSV file located in the Amazon S3 bucket.
To do this, you need to use the
URL
function of ClickHouse®. The syntax of this function looks like this:'<URL>', 'CSV', '<structure>'
<URL>
is the path to your fileCSV
is the format of the file you want to import<structure>
is the structure of the target table.
Pay attention to the structure of the
<URL>
section in the following code example - it incorporates both the Path-style URL to the CSV file and theINSERT
query:INSERT INTO nc_sales SELECT * FROM s3 ( 'https://doublecloud-docs.s3.eu-central-1.amazonaws.com/data-sets/sample_data.csv', 'CSV', 'order_date Date, city String, district String, borough String, rep String, item String, sold Int, price Float, total Float' )
-
Use the following query to check your imported data:
SELECT * FROM nc_sales LIMIT 20
You should see the data from your CSV file in the terminal output:
┌─order_date─┬─city───────┬─district────┬─borough─────────┬─rep──────┬─item────┬─sold─┬─price─┬───total─┐ │ 2075-01-23 │ Night City │ City Center │ Corporate Plaza │ Kivell │ Binder │ 50 │ 19.99 │ 999.5 │ │ 2075-02-26 │ Night City │ City Center │ Corporate Plaza │ Gill │ Pen │ 27 │ 19.99 │ 539.73 │ │ 2075-03-15 │ Night City │ Heywood │ Japantown │ Sorvino │ Pencil │ 56 │ 2.99 │ 167.44 │ │ 2075-04-01 │ Night City │ Watson │ Kabuki │ Jones │ Binder │ 60 │ 4.99 │ 299.4 │ │ 2075-04-18 │ Night City │ City Center │ Corporate Plaza │ Andrews │ Pencil │ 75 │ 1.99 │ 149.25 │ │ 2075-05-05 │ Night City │ City Center │ Corporate Plaza │ Jardine │ Pencil │ 90 │ 4.99 │ 449.1 │ │ 2075-05-22 │ Night City │ Heywood │ Japantown │ Thompson │ Pencil │ 32 │ 1.99 │ 63.68 │ │ 2075-06-01 │ Night City │ Watson │ Kabuki │ Jones │ Pencil │ 95 │ 1.99 │ 189.05 │ │ 2075-06-08 │ Night City │ Watson │ Kabuki │ Jones │ Binder │ 60 │ 8.99 │ 539.4 │ │ 2075-06-25 │ Night City │ City Center │ Corporate Plaza │ Morgan │ Pencil │ 90 │ 4.99 │ 449.1 │ │ 2075-07-12 │ Night City │ Watson │ Kabuki │ Howard │ Binder │ 29 │ 1.99 │ 57.71 │ │ 2075-07-29 │ Night City │ Watson │ Kabuki │ Parent │ Binder │ 81 │ 19.99 │ 1619.19 │ │ 2075-08-11 │ Night City │ Watson │ Kabuki │ Parent │ Pen │ 15 │ 19.99 │ 299.85 │ │ 2075-08-15 │ Night City │ Watson │ Kabuki │ Jones │ Pencil │ 35 │ 4.99 │ 174.65 │ │ 2075-09-01 │ Night City │ City Center │ Corporate Plaza │ Smith │ Desk │ 2 │ 125 │ 250 │ │ 2075-09-02 │ Night City │ City Center │ Corporate Plaza │ Jardine │ Pencil │ 36 │ 4.99 │ 179.64 │ │ 2075-09-18 │ Night City │ Watson │ Kabuki │ Jones │ Pen Set │ 16 │ 15.99 │ 255.84 │ │ 2075-10-05 │ Night City │ City Center │ Corporate Plaza │ Morgan │ Binder │ 28 │ 8.99 │ 251.72 │ │ 2075-10-22 │ Night City │ Watson │ Kabuki │ Jones │ Pen │ 64 │ 8.99 │ 575.36 │ │ 2075-11-25 │ Night City │ City Center │ Corporate Plaza │ Kivell │ Pen Set │ 96 │ 4.99 │ 479.04 │ └────────────┴────────────┴─────────────┴─────────────────┴──────────┴─────────┴──────┴───────┴─────────┘
Sample connection scripts
We tested these scripts in the following environments:
- Ubuntu 20.04
- Python 3:
3.10.6
, pip:22.0.2
- Go:
1.13.8
- Python 3:
Python
These instructions allow you to create a script that will perform all the actions to import your CSV file with a single terminal command. Moreover, it retains complete control over the process.
In this example, we'll use the clickhouse-driver
In short, you need to complete the following steps:
- Install the necessary software.
- Write a script to upload your data to the cluster.
- Upload the data to your cluster.
Let's follow every step in detail.
-
Install the software and dependencies:
sudo apt update && sudo apt install -y python3 python3-pip && pip3 install requests
-
Install the ClickHouse® driver for Python:
pip install clickhouse-driver
-
Write the script to upload your data to the cluster. For the purpose of this scenario, create a simple table that will contain performance statistics for a group of sales representatives. Let's call it
RegionSales
:csv_uploader_test.py
import csv # Declare the Client interface class from the ClickHouse® driver from clickhouse_driver import Client # Provide the necessary information to connect to your cluster CH_Client = Client( '<FQDN of any ClickHouse® host on your cluster>', user='<Cluster user name>', password='<Cluster user password>', secure=True) # Create a table on your cluster CH_Client.execute( 'CREATE TABLE RegionSales ON CLUSTER default (' '`OrderDate` Date, ' '`City` String, ' '`District` String, ' '`Borough` String, ' '`Rep` String, ' '`Item` String, ' '`Sold` Int, ' '`Price` Float, ' '`Total` Float) ENGINE = ReplicatedMergeTree() ' 'PARTITION BY (City) ORDER BY (OrderDate)') # Create a generator to fetch parsed rows def row_reader(): with open('<Path to your CSV file>') as TableName: for line in csv.reader(TableName): yield line # Use the generator to load values into the table on your cluster CH_Client.execute("INSERT INTO RegionSales VALUES", (line for line in row_reader()))
-
Run the script to upload the data to your cluster:
python3 csv_uploader_test.py
-
Check your imported data:
<Native interface connection string> --query="SELECT * FROM RegionSales"
You should see the data from your CSV file in the terminal output.
Go
These instructions allow you to create a script that will perform all the actions to import your CSV file with a single terminal command. Moreover, it retains complete control over the process.
In short, you need to complete the following steps:
- Install the necessary software.
- Write a script to upload your data to the cluster.
- Upload the data to your cluster.
Let's follow every step in detail.
-
Install the necessary software and dependencies:
-
Install the Go language interpreter:
sudo apt update && sudo apt install -y golang-go
-
Install Git:
sudo apt update && sudo apt install -y git
-
Install clickhouse-go and the necessary packages:
go get -v -u github.com/ClickHouse/clickhouse-go/v2 \ github.com/google/uuid \ github.com/pierrec/lz4/v4 \ github.com/ClickHouse/clickhouse-go/lib/proto/query.go:25:2
-
-
Write the script to upload your data to the cluster. For the purpose of this scenario, create a simple table that will contain performance statistics for a group of sales representatives. Let's call it
RegionSales
:csv_uploader_test.go
package main import ( "database/sql" "encoding/csv" "fmt" _ "github.com/ClickHouse/clickhouse-go/v2" "io" "log" "os" "strconv" ) func main() { host := "<FQDN of any ClickHouse® host on your cluster>" user := "<Cluster user name>" password := "<Cluster user password>" dsn := fmt.Sprintf("clickhouse://%s:%s@%s:9440?username=%s&password=%s&secure=true", user, password, host) client, err := sql.Open("clickhouse", dsn) if err != nil { log.Fatal(err) } if err := client.Ping(); err != nil { panic(err) } if _, err := client.Exec(`CREATE TABLE IF NOT EXISTS test_table ON CLUSTER '{cluster}' ( name String, value Int ) ENGINE = ReplicatedMergeTree() PARTITION BY (name) ORDER BY (value)`); err != nil { panic(err) } scope, err := client.Begin() if err != nil { panic(err) } batch, err := scope.Prepare("INSERT INTO test_table") if err != nil { panic(err) } data := readCSV() opened := true var row []interface{} for opened { row, opened = <-data if !opened { break } if _, err := batch.Exec(row...); err != nil { panic(err) } } if err := scope.Commit(); err != nil { panic(err) } } func readCSV() chan []interface{} { c := make(chan []interface{}) go func() { file, err := os.Open("<FILENAME>") if err != nil { panic(err) } defer file.Close() csvReader := csv.NewReader(file) csvReader.TrimLeadingSpace = true var record []string for err == nil { record, err = csvReader.Read() if err == io.EOF { continue } if err != nil { panic(err) } row := make([]interface{}, len(record)) for i, col := range record { row[i] = col } value, err := strconv.Atoi(record[1]) if err != nil { panic(err) } c <- []interface{}{ record[0], int32(value), } } close(c) }() return c }
-
Upload the data to your cluster:
go run csv_uploader_test.go
-
Check your imported data:
DockerNative clickhouse-clientdocker run --network host --rm -it clickhouse/<Native interface connection string> \ --query="SELECT * FROM RegionSales"
The complete Docker command structure
docker run --network host --rm -it \ clickhouse/clickhouse-client \ --host <FQDN of your cluster> \ --secure \ --user <cluster user name> \ --password <cluster user password> \ --port 9440 --query="SELECT * FROM RegionSales"
<Native interface connection string> --query="SELECT * FROM RegionSales"
You should see the data from your CSV file in the terminal output.