# clickhouse_hadoop **Repository Path**: ByteDance/clickhouse_hadoop ## Basic Information - **Project Name**: clickhouse_hadoop - **Description**: Import data from clickhouse to hadoop with pure SQL - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-08-26 - **Last Updated**: 2025-09-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ClickHouse Hadoop Integrate ClickHouse natively with Hive, currently only writing is supported. Connecting Hadoop's massive data storage and deep processing power with the high performance of ClickHouse. ## Build the Project ```bash mvn package -Phadoop26 -DskipTests ``` ## Run the test cases It is required that a clickhouse-server is running in the localhost to correctly run the test cases. ## Usage ### Create ClickHouse table ```sql CREATE TABLE hive_test ( c1 String, c2 Float64, c3 String ) ENGINE = MergeTree() PARTITION BY c3 ORDER BY c1 ``` ### Create Hive External Table Before starting the hive cli, set the environment variable `HIVE_AUX_JARS_PATH` ```bash export HIVE_AUX_JARS_PATH=/target/clickhouse-hadoop-.jar ``` Then start the `hive-cli` and create Hive external table: ```sql CREATE EXTERNAL TABLE default.ck_test( c1 string, c2 double, c3 string ) STORED BY 'data.bytedance.net.ck.hive.ClickHouseStorageHandler' TBLPROPERTIES('clickhouse.conn.urls'='jdbc:clickhouse://:,jdbc:clickhouse://:', 'clickhouse.table.name'='hive_test'); ``` ### Data Ingestion In `hive-cli` ```sql INSERT INTO default.ck_test select c1, c2, c3 FROM default.source_table where part='part_val' ```