您的足迹:首页 > Hadoop >【原创】部署分布式Drill集群

【原创】部署分布式Drill集群

Deploying Drill in a Cluster

部署分布式Drill集群

描述

要运行分布式环境,需执行如下步骤:

1、 在每台节点上安装dirll

2、 配置cluster ID并添加Zookeeper的信息

3、 使用drill连接数据源

4、 启动drill

准备

在安装drill之前请检查以下软件是否已经安装,服务是否能正常使用

1、 oracle jdk version7,尽量使用1.7版本,高版本会有警告,低版本也不行。

2、 已经安装并正确配置了zookeeper

3、 正确配置并正常运行的Hadoop分布式集群(Recommended)

4、 DNS(Recommended),也就是/etc/hosts文件已经配置好。

安装drill

在每个节点上执行如下步骤:

1、 下载drill

Curl http://getdrill.org/drill/download/apache-drill-0.8.0.tar.gz

2、 解压dirll,并创建drill的安装目录

tar xzf apache-drill-.tar.gz --strip=1 -C /opt/drill

3、 如果你使用外部jar 文件,在drill的安装目录的conf文件夹里找到drill-env.sh并编辑,将Hadoop的安装目录添加进去,也就是HADOOP_HOME,换成你的Hadoop_home

export HADOOP_HOME="~/hadoop/hadoop-0.20.2/"

4、 编辑drill-override.conf文件,创建一个唯一的cluster ID,并提供zookeeper的主机名和端口,用于drill向zookeeper注册,被zookeeper管理。

5、 每个节点的dirll配置相同的cluster ID,这样每个drill就可以共享相同的ID,zookeeper的默认端口为2181。Zookeeper的主机名和端口号在zk.connect里,。

示例

drill.exec: {
  cluster-id: "drill-cluster",
  zk.connect: "master:2181,slave1:2181,slave2:2181",
  debug.error_on_leak: false,
  buffer.size: 6,
  functions: ["org.apache.drill.expr.fn.impl", "org.apache.drill.udfs"]
}

使用drill连接数据源

你可以使用各种类型的数据源,请参考Connect Apache Drill to Data Sources有详细的各种数据配置说明。

启动dirll

请按照以下步骤操作,来启动drill。

1、 切换到drill的安装目录,执行如下命令启动drill,在每个drill节点都执行这个命令。

bin/drillbit.sh start

2、 执行以下命令,调用SQLLINE并启动drill(我们已经在文件里配置了zookeeper)

bin/sqlline -u jdbc:drill:

如果不能连接到drill并调用sqlline时,会有如下提示:

Example: 0: jdbc:drill:zk=:

那么,这时候就需要你手动指定zookeeper的地址了

bin/sqlline -u jdbc:drill:zk=master:2181,slave1:2181,slave2:2181

3、 执行如下命令,以验证所有dirll是否都已经被加入到集群。

0: jdbc:drill:zk=: select * from sys.drillbits;

Dirll提供了所有已经加入的drill。如下所示:

0: jdbc:drill:> select * from sys.drillbits;
+------------+------------+--------------+------------+------------+
|  hostname  | user_port  | control_port | data_port  |  current   |
+------------+------------+--------------+------------+------------+
| master     | 31010      | 31011        | 31012      | true       |
| slave2     | 31010      | 31011        | 31012      | false      |
| slave1     | 31010      | 31011        | 31012      | false      |
+------------+------------+--------------+------------+------------+
3 rows selected (0.304 seconds)

查询数据示例

现在你可以使用drill查询数据了,drill在安装目录的example-data目录里,自带了示例数据,请参考文档:Querying Parquet Files.

Region File

0: jdbc:drill:> select * from dfs.`/usr/local/drill/sample-data/region.parquet`;

+-------------+------------+------------+

| R_REGIONKEY |   R_NAME   | R_COMMENT  |

+-------------+------------+------------+

| 0           | AFRICA     | lar deposits. blithe |

| 1           | AMERICA    | hs use ironic, even  |

| 2           | ASIA       | ges. thinly even pin |

| 3           | EUROPE     | ly final courts cajo |

| 4           | MIDDLE EAST | uickly special accou |

+-------------+------------+------------+

5 rows selected (1.112 seconds)

Nation File

0: jdbc:drill:> select * from dfs.`/usr/local/drill/sample-data/nation.parquet`;
+-------------+------------+-------------+------------+
| N_NATIONKEY |   N_NAME   | N_REGIONKEY | N_COMMENT  |
+-------------+------------+-------------+------------+
| 0           | ALGERIA    | 0           |  haggle. carefully f |
| 1           | ARGENTINA  | 1           | al foxes promise sly |
| 2           | BRAZIL     | 1           | y alongside of the p |
| 3           | CANADA     | 1           | eas hang ironic, sil |
| 4           | EGYPT      | 4           | y above the carefull |
| 5           | ETHIOPIA   | 0           | ven packages wake qu |
| 6           | FRANCE     | 3           | refully final reques |
| 7           | GERMANY    | 3           | l platelets. regular |
| 8           | INDIA      | 2           | ss excuses cajole sl |
| 9           | INDONESIA  | 2           |  slyly express asymp |
| 10          | IRAN       | 4           | efully alongside of  |
| 11          | IRAQ       | 4           | nic deposits boost a |
| 12          | JAPAN      | 2           | ously. final, expres |
| 13          | JORDAN     | 4           | ic deposits are blit |
| 14          | KENYA      | 0           |  pending excuses hag |
| 15          | MOROCCO    | 0           | rns. blithely bold c |
| 16          | MOZAMBIQUE | 0           | s. ironic, unusual a |
| 17          | PERU       | 1           | platelets. blithely  |
| 18          | CHINA      | 2           | c dependencies. furi |
| 19          | ROMANIA    | 3           | ular asymptotes are  |
| 20          | SAUDI ARABIA | 4           | ts. silent requests  |
| 21          | VIETNAM    | 2           | hely enticingly expr |
| 22          | RUSSIA     | 3           |  requests against th |
| 23          | UNITED KINGDOM | 3           | eans boost carefully |
| 24          | UNITED STATES | 1           | y final packages. sl |
+-------------+------------+-------------+------------+
25 rows selected (0.292 seconds)

Querying JSON Files 查询json数据

在drill的安装目录里已经集成了json的示例数据employee.json,通过以下sql可以查询json数据。

示例:

0: jdbc:drill:> SELECT * FROM cp.`employee.json` LIMIT 5;
+-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
| employee_id | full_name  | first_name | last_name  | position_id | position_title |  store_id  | department_id | birth_date | hire_date  |   salary   | supervisor_id | education_level | marital_status |   gender   | management_role |
+-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
| 1           | Sheri Nowmer | Sheri      | Nowmer     | 1           | President      | 0          | 1             | 1961-08-26 | 1994-12-01 00:00:00.0 | 80000.0    | 0             | Graduate Degree | S              | F          | Senior Management |
| 2           | Derrick Whelply | Derrick    | Whelply    | 2           | VP Country Manager | 0          | 1             | 1915-07-03 | 1994-12-01 00:00:00.0 | 40000.0    | 1             | Graduate Degree | M              | M          | Senior Management |
| 4           | Michael Spence | Michael    | Spence     | 2           | VP Country Manager | 0          | 1             | 1969-06-20 | 1998-01-01 00:00:00.0 | 40000.0    | 1             | Graduate Degree | S              | M          | Senior Management |
| 5           | Maya Gutierrez | Maya       | Gutierrez  | 2           | VP Country Manager | 0          | 1             | 1951-05-10 | 1998-01-01 00:00:00.0 | 35000.0    | 1             | Bachelors Degree | M              | F          | Senior Management |
| 6           | Roberta Damstra | Roberta    | Damstra    | 3           | VP Information Systems | 0          | 2             | 1942-10-08 | 1994-12-01 00:00:00.0 | 25000.0    | 1             | Bachelors Degree | M              | F          | Senior Management |
+-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
5 rows selected (1.809 seconds)

SELECT * FROM a CSV File

0: jdbc:drill:zk=local> select * from dfs.`/Users/brumsby/drill/plays.csv`;

+-----------------------------------+
|              columns              |
+-----------------------------------+
| ["1599","As You Like It"]         |
| ["1601","Twelfth Night"]          |
| ["1594","Comedy of Errors"]       |
| ["1595","Romeo and Juliet"]       |
| ["1596","The Merchant of Venice"] |
| ["1610","The Tempest"]            |
| ["1599","Hamlet"]                 |
+-----------------------------------+
7 rows selected (0.089 seconds)

可以查询具体的某列值

0: jdbc:drill:zk=local> select columns[0], columns[1] from dfs.`/Users/brumsby/drill/plays.csv`;

+------------+------------------------+
|   EXPR$0   |         EXPR$1         |
+------------+------------------------+
| 1599       | As You Like It         |
| 1601       | Twelfth Night          |
| 1594       | Comedy of Errors       |
| 1595       | Romeo and Juliet       |
| 1596       | The Merchant of Venice |
| 1610       | The Tempest            |
| 1599       | Hamlet                 |
+------------+------------------------+
7 rows selected (0.137 seconds)

可以给某列起个别名

0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
from dfs.`/Users/brumsby/drill/plays.csv`;

+------------+------------------------+
|    Year    |    Play                |
+------------+------------------------+
| 1599       | As You Like It         |
| 1601       | Twelfth Night          |
| 1594       | Comedy of Errors       |
| 1595       | Romeo and Juliet       |
| 1596       | The Merchant of Venice |
| 1610       | The Tempest            |
| 1599       | Hamlet                 |
+------------+------------------------+
7 rows selected (0.113 seconds)


0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
from dfs.`/Users/brumsby/drill/plays.csv` where columns[0]>1599;

+------------+---------------+
|    Year    |      Play     |
+------------+---------------+
| 1601       | Twelfth Night |
| 1610       | The Tempest   |
+------------+---------------+
2 rows selected (0.201 seconds)

Example of Querying a TSV File

 USE dfs;
SELECT COLUMNS[0] AS Ngram,
       COLUMNS[1] AS Publication_Date,
       COLUMNS[2] AS Frequency
FROM `/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv`
WHERE ((columns[0] = 'Zoological Journal of the Linnean')
AND (columns[2] > 250)) LIMIT 10;

输出结果为:

+------------------------------------+-------------------+------------+
 |               Ngram                | Publication_Date  | Frequency  |
 +------------------------------------+-------------------+------------+
 | Zoological Journal of the Linnean  | 1993              | 297        |
 | Zoological Journal of the Linnean  | 1997              | 255        |
 | Zoological Journal of the Linnean  | 2003              | 254        |
 | Zoological Journal of the Linnean  | 2007              | 284        |
 | Zoological Journal of the Linnean  | 2008              | 257        |
 +------------------------------------+-------------------+------------+
 5 rows selected (1.175 seconds)

Query the GZ File Directly

SELECT COLUMNS[0], 
        COLUMNS[1], 
        COLUMNS[2] 
 FROM dfs.`/Users/drilluser/Downloads/googlebooks-eng-all-5gram-20120701-zo.tsv.gz` 
 WHERE ((columns[0] = 'Zoological Journal of the Linnean') 
 AND (columns[2] > 250)) 
 LIMIT 10;

Querying Hive 查询hive表数据

启动hive

hive

通过hive的shell创建表

hive> create table customers(FirstName string, LastName string, Company string, Address string, City string, County string, State string, Zip string, Phone string, Fax string, Email string, Web string) row format delimited fields terminated by ',' stored as textfile;

加载数据

hive> load data local inpath '/usr/local/customers.csv' overwrite into table customers;
Loading data to table default.customers
Table default.customers stats: [numFiles=1, numRows=0, totalSize=77681, rawDataSize=0]
OK
Time taken: 0.863 seconds


以上翻译自drill官网,属译文原创,转载请注明出处。

本博客所有文章如无特别注明均为原创。作者:数据为王复制或转载请以超链接形式注明转自 数据为王
原文地址《【原创】部署分布式Drill集群

相关推荐


  • blogger

发表评论

路人甲 表情
看不清楚?点图切换 Ctrl+Enter快速提交

网友评论(0)