javahive-e的简单介绍

admin 2023-01-05 17:54:12 800

本篇文章给大家谈谈javahive-e，以及对应的知识点，希望对各位有所帮助，不要忘了收藏本站喔。

本文目录一览：

1、如何将hive查询结果导出成txt文件
2、怎么判断hive表是分区表,并拿到分区列的列名
3、如何在Java中执行Hive命令或HiveQL
4、Hive常用命令
5、hive 需要写java代码吗
6、解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。

如何将hive查询结果导出成txt文件

最近在使用hive时，需要将hive查询的数据导出到本地文件系统，HQL语法如下：

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1

查询结果导出到本地文件后，试图使用excel加载时遇上了麻烦：不知道hive导出文件时使用的分隔符，

使用文本编辑器打开也显示乱码。

最后在官方文档上找了半天才发现，hive使用 ^A 符号作为域的分隔符，原文如下：

Data written to the filesystem is serialized as text with columns separated by ^A

在python中可以使用line.split('\x01')来进行切分，也可以使用line.split('\001')，注意其中是单引号

在java中可以使用split("\\u0001")来进行切分

如果确实需要将查询结果导出到本地文件，最好使用hive的命令：

[sql] view plaincopy在CODE上查看代码片派生到我的代码片

bin/hive -e "select * from test" res.csv

或者是：

bin/hive -f sql.q res.csv

其中文件sql.q写入你想要执行的查询语句

这问题在使用hive hql streaming时也会遇到，当你使用hql strreaming 将输出直接写入到hdfs目录，然后你在该目录上创建hive 外部表时，hive的输出结果会将streaming 的key和value之间的分隔符修改为 \001，所以，在hql streaming和输出为hive 外部表时，最好将streaming的最后输出的分隔符修改为\001，并且hive外部表的分隔符应该设置为\001，注意：不要使用^A和\x01，如：

[html] view plaincopy在CODE上查看代码片派生到我的代码片

create external table site_user_mapping_info_month(uid String,tag string,project_ids string,site_interests string) PARTITIONED BY(year String, month String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LOCATION '/project/site_grouping/site_user_mapping_info_month';

怎么判断hive表是分区表,并拿到分区列的列名

目前没有明确的判断语句只能通过脚本

show create table tableName 有partition的是分区

获取分区

show partitions tableName 就能获取这个表所有的表名

如果用shell脚本写的话是这样

tablePartition=`hive -e "show create table tableName " 21 | grep "partition"`

if [ -n $tablePartition ]

then

Partitions=`hive -e " show partitions tableName"`

如何在Java中执行Hive命令或HiveQL

这里所说的在Java中执行Hive命令或HiveQL并不是指Hive

Client通过JDBC的方式连接HiveServer（or

HiveServer2）执行查询，而是简单的在部署了HiveServer的服务器上执行Hive命令。当然这是一个简单的事情，平常我们通过Hive做简单的数据分析实验的时候，都是直接进入Hive执行HiveQL

通过进入Hive执行HiveQL，只能将分析结果打印到屏幕或是存入临时表，如果想把分析结果写入文件，或者对分析结果做进一步的分析，用程序做分析，就是为什么要在Java中执行Hive命令。

Java在1.5过后提供了ProcessBuilder根据运行时环境启动一个Process调用执行运行时环境下的命令或应用程序（1.5以前使用Runtime），关于ProcessBuilder请参考Java相关文档。调用代码如下：

String

sql="show

tables;

select

from

test_tb

limit

10";

ListString

command

new

ArrayListString();

command.add("hive");

command.add("-e");

command.add(sql);

ListString

results

new

ArrayListString();

ProcessBuilder

hiveProcessBuilder

new

ProcessBuilder(command);

hiveProcess

hiveProcessBuilder.start();

BufferedReader

new

BufferedReader(new

InputStreamReader(

hiveProcess.getInputStream()));

String

data

null;

while

((data

br.readLine())

null)

{

results.add(data);

}其中command可以是其它Hive命令，不一定是HiveQL。

Hive常用命令

#hive相关资料

#获取主机相关信息

export password='qwe'

export your_ip=$(ip ad|grep inet|grep -v inet6|grep -v 127.0.0.1|awk '{print $2}'|cut -d/ -f1)

export your_hosts=$(cat /etc/hosts |grep $(echo $your_ip)|awk '{print $2}')

#安装mysql

echo "mysql-server-5.5 mysql-server/root_password password $password" | debconf-set-selections

echo "mysql-server-5.5 mysql-server/root_password_again password $password" | debconf-set-selections

apt-get -y install mariadb-server python-pymysql --force-yes

echo "[mysqld]

bind-address = $your_ip

default-storage-engine = innodb

innodb_file_per_table

max_connections = 4096

collation-server = utf8_general_ci

character-set-server = utf8" | tee /etc/mysql/conf.d/openstack.cnf

sed -i "s/127.0.0.1/0.0.0.0/g" /etc/mysql/mariadb.conf.d/50-server.cnf

service mysql restart

#创建hive用户和赋予权限

mysql -uroot -p$password EOF

CREATE DATABASE hive;

CREATE USER 'hive' IDENTIFIED BY "$password";

GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;

flush privileges;

EOF

#增加hive环境变量

hive_flag=$(grep "hive" /etc/profile)

if [ ! -n "$hive_flag" ]; then

sed -i "s/\$PATH:/\$PATH:\/opt\/apache-hive-2.3.2-bin\/bin:/g" /etc/profile

else

echo "Already exist!"

#使脚本中环境变量生效

source /etc/profile

#修改hive配置

echo "$(grep "JAVA_HOME=" /etc/profile)

$(grep "HADOOP_HOME=" /etc/profile)

export HIVE_HOME=/opt/apache-hive-2.3.2-bin

export HIVE_CONF_DIR=/opt/apache-hive-2.3.2-bin/conf" |tee /opt/apache-hive-2.3.2-bin/conf/hive-env.sh

sed -i "s/hadoop3/$your_hosts/g" /opt/apache-hive-2.3.2-bin/conf/hive-site.xml

#在hdfs 中创建下面的目录，并赋予所有权限

hdfs dfs -mkdir -p /user/hive/warehouse

hdfs dfs -mkdir -p /user/hive/tmp

hdfs dfs -mkdir -p /user/hive/log

hdfs dfs -chmod -R 777 /user/hive/warehouse

hdfs dfs -chmod -R 777 /user/hive/tmp

hdfs dfs -chmod -R 777 /user/hive/log

mkdir -p /user/hive/tmp

#初始化hive

schematool -dbType mysql -initSchema

#安装hive到此结束

#######################

#创建hive表

create table film

(name string,

time string,

score string,

id int,

time1 string,

score1 string,

name2 string,

score2 string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ';'

STORED AS TEXTFILE;

#将本地文本导入hive

load data local inpath '/root/my.txt' overwrite into table film;

#hive相关笔记

create table patition_table(name string,salary float,gender string) partitioned by (dt string,dep string) row format delimited fields terminated by ',' stored as textfile;

create database movie;

create table movie(name string,data string,record int);

#删除表

DROP TABLE if exists movies;

#创建表

CREATE TABLE movies(

name string,

data string,

record int

) COMMENT '2014全年上映电影的数据记录' FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

load data local inpath 'dat0204.log' into table movies;

#hive 中使用dfs命令

hive dfs -ls /user/hive/warehouse/wyp ;

select * from movies;

hive -e "select * from test" res.csv

或者是：

hive -f sql.q res.csv

其中文件sql.q写入你想要执行的查询语句

#导出到本地文件系统

hive insert overwrite local directory '/home/wyp/wyp'

hive select * from wyp;

导出到HDFS中

和导入数据到本地文件系统一样的简单，可以用下面的语句实现：

hive insert overwrite directory '/home/wyp/hdfs'

hive select * from wyp;

将会在HDFS的/home/wyp/hdfs目录下保存导出来的数据。注意，和导出文件到本地文件系统的HQL少一个local，数据的存放路径就不一样了。

#将提取到的数据保存到临时表中

insert overwrite table movies

本地加载 load data local inpath '/Users/tifa/Desktop/1.txt' into table test;

从hdfs上加载数据 load data inpath '/user/hadoop/1.txt' into table test_external;

抹掉之前的数据重写 load data inpath '/user/hadoop/1.txt' overwrite into table test_external;

hive 需要写java代码吗

如果你的项目是java项目的话，就需要使用hive提供的java api，如下代码：

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.sql.Statement;

import org.apache.log4j.Logger;

/**

* Hive的JavaApi

* 启动hive的远程服务接口命令行执行：hive --service hiveserver /dev/null 2/dev/null

* @author 吖大哥

public class HiveJdbcCli {

private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

private static String url = "jdbc:hive://hadoop3:10000/default";

private static String user = "hive";

private static String password = "mysql";

private static String sql = "";

private static ResultSet res;

private static final Logger log = Logger.getLogger(HiveJdbcCli.class);

public static void main(String[] args) {

Connection conn = null;

Statement stmt = null;

try {

conn = getConn();

stmt = conn.createStatement();

// 第一步:存在就先删除

String tableName = dropTable(stmt);

// 第二步:不存在就创建

createTable(stmt, tableName);

// 第三步:查看创建的表

showTables(stmt, tableName);

// 执行describe table操作

describeTables(stmt, tableName);

// 执行load data into table操作

loadData(stmt, tableName);

// 执行 select * query 操作

selectData(stmt, tableName);

// 执行 regular hive query 统计操作

countData(stmt, tableName);

} catch (ClassNotFoundException e) {

e.printStackTrace();

log.error(driverName + " not found!", e);

System.exit(1);

} catch (SQLException e) {

e.printStackTrace();

log.error("Connection error!", e);

System.exit(1);

} finally {

try {

if (conn != null) {

conn.close();

conn = null;

}

if (stmt != null) {

stmt.close();

stmt = null;

}

} catch (SQLException e) {

e.printStackTrace();

}

private static void countData(Statement stmt, String tableName)

throws SQLException {

sql = "select count(1) from " + tableName;

System.out.println("Running:" + sql);

res = stmt.executeQuery(sql);

System.out.println("执行“regular hive query”运行结果:");

while (res.next()) {

System.out.println("count ------" + res.getString(1));

}

private static void selectData(Statement stmt, String tableName)

throws SQLException {

sql = "select * from " + tableName;

System.out.println("Running:" + sql);

res = stmt.executeQuery(sql);

System.out.println("执行 select * query 运行结果:");

while (res.next()) {

System.out.println(res.getInt(1) + "\t" + res.getString(2));

}

private static void loadData(Statement stmt, String tableName)

throws SQLException {

String filepath = "/home/hadoop01/data";

sql = "load data local inpath '" + filepath + "' into table "

+ tableName;

System.out.println("Running:" + sql);

res = stmt.executeQuery(sql);

}

private static void describeTables(Statement stmt, String tableName)

throws SQLException {

sql = "describe " + tableName;

System.out.println("Running:" + sql);

res = stmt.executeQuery(sql);

System.out.println("执行 describe table 运行结果:");

while (res.next()) {

System.out.println(res.getString(1) + "\t" + res.getString(2));

}

private static void showTables(Statement stmt, String tableName)

throws SQLException {

sql = "show tables '" + tableName + "'";

System.out.println("Running:" + sql);

res = stmt.executeQuery(sql);

System.out.println("执行 show tables 运行结果:");

if (res.next()) {

System.out.println(res.getString(1));

}

private static void createTable(Statement stmt, String tableName)

throws SQLException {

sql = "create table "

+ tableName

+ " (key int, value string) row format delimited fields terminated by '\t'";

stmt.executeQuery(sql);

}

private static String dropTable(Statement stmt) throws SQLException {

// 创建的表名

String tableName = "testHive";

sql = "drop table " + tableName;

stmt.executeQuery(sql);

return tableName;

}

private static Connection getConn() throws ClassNotFoundException,

SQLException {

Class.forName(driverName);

Connection conn = DriverManager.getConnection(url, user, password);

return conn;

}

解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。

配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。发现它总是加载默认的配置文件。

解决：

hadoop的配置文件hadoop-env.sh中加上export HADOOP_CLASSPATH=$HIVE_HOVE/conf:$HADOOP_CLASSPATH

还有一个问题：运行其他hadoop子项目时总会出现找不到类文件，必须把相关jar包拷贝到hadoop的lib目录下，导致其lib目录会越来越大。

至今不知道怎样将其他jar包加入到classpath中,网上说用export HADOOP_CLASSPATH=“”，但是好像不行

hive --config /root/etc/hive(注：/root/etc/hive是指hive-site.xml的存放目录）

HiveQL以分号结束。可以跨行。

在hive的shell上可以使用dfs命令执行HDFS文件操作。

dfs -ls /user/hive/warehouse;

hive语句必须以分号“；”结束。

不支持更新，索引和事务。

表名，列名不区分大小写。

在hive的shell上可以使用dfs命令执行HDFS文件的操作。dfs -ls /user/hive/warehouse/;

查看和设置临时变量：set fs.default.name[=hdfs://zhaoxiang:9000] ;

导入jar包: add jar hivejar.jar;

创建函数: create temporary function udfTest as 'com.cstore.udfExample';

【在pig中使用UDF，先用register语句注册jar文件，之后可以通过完全的java类名调用，或者用define语句为UDFding 指定一个名称：

define UPPER org.pigjar.string.UPPER();

B = foreach a generate UPPER($0); 】

可以在本地命令行运行hive的shell:

$ hive -e 'select * from userinfo' (执行hiveQL语句)

$ hive --config /hive-0.9.0/conf (重新载入新的配置文件)

$ hive --service hiveserver 50000(启动服务)

create table cite(citing int, cited int) row format delimited fields terminated by ',' stored as textfile; //sequencefle

load data (local) inpath 'cite75_99.txt' overwrite into table cite;//若不加local则默认为HDFS路径

select * from cite limit 10;

show tables;

describe cite;

select count(1)/count( ) from cite; //count(1)相当于SQL中的count( )

create table cite_count (cited int, count int);

insert overwrite table cite_count select cited , count(citing) from cite group by cited;

select * from cite_count where count 10 limit 10;

drop table cite_count;

create table page_view(viewTime int, userid bigint,

page_url string, referrer_url string,

ip string comment 'ip address of user')

comment 'this id the page view table'

partitioned by (dt string, country string)//注意table中的列不能和partition中的列重合

clustered by (userid) into 32 buckets //桶

row format delimited

fields terminated by ','

collection items terminated by '\002'

map keys terminated by '\003'

lines terminated by '\n'

stored as textfile;

//取样

select avg(viewTime) from page_view tablesample(bucket 1 out of 3 [on id]);

//创建外部表，指定目录位置，删除外部表时会删除元数据，表中的数据需手动删除

create external table page_view(viewTime int, userid bigint,

page_url string, referrer_url string,

ip string comment 'ip address of user')

location 'path/to/existing/table/in/HDFS';

//修改表

alter table page_view rename to pv;

alter table pv add columns (newcol string);

alter table pv drop partition (dt='2009-09-01');

show tables 'page_.*';

load data local inpath 'page_view.txt'

overwrite into table page_view

partition (dt='2009-09-01',country='US');

在hive的shell上执行unix命令：命令前加感叹号(！),命令尾加分号(;).

hive ! ls ;

hive ! head hive_result;

//执行查询：

insert overwrite table query-result; //eg: insert overwrite table query_result select * from page_view where country='US';

insert overwrite (local) directory '/hdfs-dir(local-dir)/query_result' query;

select country , count(distinct userid) from page_view group by countey;

//子查询，只能在from子句中出现子查询

select teacher, max(class-num) from

(select teacher, count(classname) as class-num from classinfo group by teacher)subquery

group by teacher;

//连接

select pv. , choice. , f.friends from page_view pv

join user u on (pv.userid=u.id)

join friends-list f on (u.id=f.uid);

//多表插入

create table mutil1 as select id, name from userinfo;

create table mutil2 like mutil1;

from userinfo insert overwrite table mutil1 select id, name

insert overwrite table mutil2 select count(distint id),name group by name;

//创建视图

create view teacher_classnum as select teacher, count(classname) from classinfo group by teacher;

关于javahive-e和的介绍到此就结束了，不知道你从中找到你需要的信息了吗？如果你还想了解更多这方面的信息，记得收藏关注本站。

标签：javahive-e