javahive-e的简单介绍
本篇文章给大家谈谈javahive-e,以及对应的知识点,希望对各位有所帮助,不要忘了收藏本站喔。
本文目录一览:
- 1、如何将hive查询结果导出成txt文件
- 2、怎么判断hive表是分区表,并拿到分区列的列名
- 3、如何在Java中执行Hive命令或HiveQL
- 4、Hive常用命令
- 5、hive 需要写java代码吗
- 6、解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。
如何将hive查询结果导出成txt文件
最近在使用hive时,需要将hive查询的数据导出到本地文件系统,HQL语法如下:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
查询结果导出到本地文件后,试图使用excel加载时遇上了麻烦:不知道hive导出文件时使用的分隔符,
使用文本编辑器打开也显示乱码。
最后在官方文档上找了半天才发现,hive使用 ^A 符号作为域的分隔符,原文如下:
Data written to the filesystem is serialized as text with columns separated by ^A
在python中可以使用line.split('\x01')来进行切分,也可以使用line.split('\001'),注意其中是单引号
在java中可以使用split("\\u0001")来进行切分
如果确实需要将查询结果导出到本地文件,最好使用hive的命令:
[sql] view plaincopy在CODE上查看代码片派生到我的代码片
bin/hive -e "select * from test" res.csv
或者是:
bin/hive -f sql.q res.csv
其中文件sql.q写入你想要执行的查询语句
这问题在使用hive hql streaming时也会遇到,当你使用hql strreaming 将输出直接写入到hdfs目录,然后你在该目录上创建hive 外部表时,hive的输出结果会将streaming 的key和value之间的分隔符修改为 \001,所以,在hql streaming和输出为hive 外部表时,最好将streaming的最后输出的分隔符修改为\001,并且hive外部表的分隔符应该设置为\001,注意:不要使用^A和\x01,如:
[html] view plaincopy在CODE上查看代码片派生到我的代码片
create external table site_user_mapping_info_month(uid String,tag string,project_ids string,site_interests string) PARTITIONED BY(year String, month String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LOCATION '/project/site_grouping/site_user_mapping_info_month';
怎么判断hive表是分区表,并拿到分区列的列名
目前没有明确的判断语句只能通过脚本
show create table tableName 有partition的是分区
获取分区
show partitions tableName 就能获取这个表所有的表名
如果用shell脚本写的话是这样
tablePartition=`hive -e "show create table tableName " 21 | grep "partition"`
if [ -n $tablePartition ]
then
Partitions=`hive -e " show partitions tableName"`
fi
如何在Java中执行Hive命令或HiveQL
这里所说的在Java中执行Hive命令或HiveQL并不是指Hive
Client通过JDBC的方式连接HiveServer(or
HiveServer2)执行查询,而是简单的在部署了HiveServer的服务器上执行Hive命令。当然这是一个简单的事情,平常我们通过Hive做简单的数据分析实验的时候,都是直接进入Hive执行HiveQL
通过进入Hive执行HiveQL,只能将分析结果打印到屏幕或是存入临时表,如果想把分析结果写入文件,或者对分析结果做进一步的分析,用程序做分析,就是为什么要在Java中执行Hive命令。
Java在1.5过后提供了ProcessBuilder根据运行时环境启动一个Process调用执行运行时环境下的命令或应用程序(1.5以前使用Runtime),关于ProcessBuilder请参考Java相关文档。调用代码如下:
String
sql="show
tables;
select
*
from
test_tb
limit
10";
ListString
command
=
new
ArrayListString();
command.add("hive");
command.add("-e");
command.add(sql);
ListString
results
=
new
ArrayListString();
ProcessBuilder
hiveProcessBuilder
=
new
ProcessBuilder(command);
hiveProcess
=
hiveProcessBuilder.start();
BufferedReader
br
=
new
BufferedReader(new
InputStreamReader(
hiveProcess.getInputStream()));
String
data
=
null;
while
((data
=
br.readLine())
!=
null)
{
results.add(data);
}其中command可以是其它Hive命令,不一定是HiveQL。
Hive常用命令
#hive相关资料
#
#
#
#
#获取主机相关信息
export password='qwe'
export your_ip=$(ip ad|grep inet|grep -v inet6|grep -v 127.0.0.1|awk '{print $2}'|cut -d/ -f1)
export your_hosts=$(cat /etc/hosts |grep $(echo $your_ip)|awk '{print $2}')
#安装mysql
echo "mysql-server-5.5 mysql-server/root_password password $password" | debconf-set-selections
echo "mysql-server-5.5 mysql-server/root_password_again password $password" | debconf-set-selections
apt-get -y install mariadb-server python-pymysql --force-yes
echo "[mysqld]
bind-address = $your_ip
default-storage-engine = innodb
innodb_file_per_table
max_connections = 4096
collation-server = utf8_general_ci
character-set-server = utf8" | tee /etc/mysql/conf.d/openstack.cnf
sed -i "s/127.0.0.1/0.0.0.0/g" /etc/mysql/mariadb.conf.d/50-server.cnf
service mysql restart
#创建hive用户和赋予权限
mysql -uroot -p$password EOF
CREATE DATABASE hive;
CREATE USER 'hive' IDENTIFIED BY "$password";
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
flush privileges;
EOF
#增加hive环境变量
hive_flag=$(grep "hive" /etc/profile)
if [ ! -n "$hive_flag" ]; then
sed -i "s/\$PATH:/\$PATH:\/opt\/apache-hive-2.3.2-bin\/bin:/g" /etc/profile
else
echo "Already exist!"
fi
#使脚本中环境变量生效
source /etc/profile
#修改hive配置
echo "$(grep "JAVA_HOME=" /etc/profile)
$(grep "HADOOP_HOME=" /etc/profile)
export HIVE_HOME=/opt/apache-hive-2.3.2-bin
export HIVE_CONF_DIR=/opt/apache-hive-2.3.2-bin/conf" |tee /opt/apache-hive-2.3.2-bin/conf/hive-env.sh
sed -i "s/hadoop3/$your_hosts/g" /opt/apache-hive-2.3.2-bin/conf/hive-site.xml
#在hdfs 中创建下面的目录 ,并赋予所有权限
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/tmp
hdfs dfs -mkdir -p /user/hive/log
hdfs dfs -chmod -R 777 /user/hive/warehouse
hdfs dfs -chmod -R 777 /user/hive/tmp
hdfs dfs -chmod -R 777 /user/hive/log
mkdir -p /user/hive/tmp
#初始化hive
schematool -dbType mysql -initSchema
#安装hive到此结束
#######################
#创建hive表
create table film
(name string,
time string,
score string,
id int,
time1 string,
score1 string,
name2 string,
score2 string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ';'
STORED AS TEXTFILE;
#将本地文本导入hive
load data local inpath '/root/my.txt' overwrite into table film;
#hive相关笔记
create table patition_table(name string,salary float,gender string) partitioned by (dt string,dep string) row format delimited fields terminated by ',' stored as textfile;
create database movie;
create table movie(name string,data string,record int);
#删除表
DROP TABLE if exists movies;
#创建表
CREATE TABLE movies(
name string,
data string,
record int
) COMMENT '2014全年上映电影的数据记录' FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
load data local inpath 'dat0204.log' into table movies;
#hive 中使用dfs命令
hive dfs -ls /user/hive/warehouse/wyp ;
select * from movies;
hive -e "select * from test" res.csv
或者是:
hive -f sql.q res.csv
其中文件sql.q写入你想要执行的查询语句
#导出到本地文件系统
hive insert overwrite local directory '/home/wyp/wyp'
hive select * from wyp;
导出到HDFS中
和导入数据到本地文件系统一样的简单,可以用下面的语句实现:
hive insert overwrite directory '/home/wyp/hdfs'
hive select * from wyp;
将会在HDFS的/home/wyp/hdfs目录下保存导出来的数据。注意,和导出文件到本地文件系统的HQL少一个local,数据的存放路径就不一样了。
#将提取到的数据保存到临时表中
insert overwrite table movies
本地加载 load data local inpath '/Users/tifa/Desktop/1.txt' into table test;
从hdfs上加载数据 load data inpath '/user/hadoop/1.txt' into table test_external;
抹掉之前的数据重写 load data inpath '/user/hadoop/1.txt' overwrite into table test_external;
hive 需要写java代码吗
如果你的项目是java项目的话,就需要使用hive提供的java api,如下代码:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import org.apache.log4j.Logger;
/**
* Hive的JavaApi
*
* 启动hive的远程服务接口命令行执行:hive --service hiveserver /dev/null 2/dev/null
*
* @author 吖大哥
*
*/
public class HiveJdbcCli {
private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
private static String url = "jdbc:hive://hadoop3:10000/default";
private static String user = "hive";
private static String password = "mysql";
private static String sql = "";
private static ResultSet res;
private static final Logger log = Logger.getLogger(HiveJdbcCli.class);
public static void main(String[] args) {
Connection conn = null;
Statement stmt = null;
try {
conn = getConn();
stmt = conn.createStatement();
// 第一步:存在就先删除
String tableName = dropTable(stmt);
// 第二步:不存在就创建
createTable(stmt, tableName);
// 第三步:查看创建的表
showTables(stmt, tableName);
// 执行describe table操作
describeTables(stmt, tableName);
// 执行load data into table操作
loadData(stmt, tableName);
// 执行 select * query 操作
selectData(stmt, tableName);
// 执行 regular hive query 统计操作
countData(stmt, tableName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
log.error(driverName + " not found!", e);
System.exit(1);
} catch (SQLException e) {
e.printStackTrace();
log.error("Connection error!", e);
System.exit(1);
} finally {
try {
if (conn != null) {
conn.close();
conn = null;
}
if (stmt != null) {
stmt.close();
stmt = null;
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
private static void countData(Statement stmt, String tableName)
throws SQLException {
sql = "select count(1) from " + tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行“regular hive query”运行结果:");
while (res.next()) {
System.out.println("count ------" + res.getString(1));
}
}
private static void selectData(Statement stmt, String tableName)
throws SQLException {
sql = "select * from " + tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行 select * query 运行结果:");
while (res.next()) {
System.out.println(res.getInt(1) + "\t" + res.getString(2));
}
}
private static void loadData(Statement stmt, String tableName)
throws SQLException {
String filepath = "/home/hadoop01/data";
sql = "load data local inpath '" + filepath + "' into table "
+ tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
}
private static void describeTables(Statement stmt, String tableName)
throws SQLException {
sql = "describe " + tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行 describe table 运行结果:");
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
}
private static void showTables(Statement stmt, String tableName)
throws SQLException {
sql = "show tables '" + tableName + "'";
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行 show tables 运行结果:");
if (res.next()) {
System.out.println(res.getString(1));
}
}
private static void createTable(Statement stmt, String tableName)
throws SQLException {
sql = "create table "
+ tableName
+ " (key int, value string) row format delimited fields terminated by '\t'";
stmt.executeQuery(sql);
}
private static String dropTable(Statement stmt) throws SQLException {
// 创建的表名
String tableName = "testHive";
sql = "drop table " + tableName;
stmt.executeQuery(sql);
return tableName;
}
private static Connection getConn() throws ClassNotFoundException,
SQLException {
Class.forName(driverName);
Connection conn = DriverManager.getConnection(url, user, password);
return conn;
}
}
解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。
配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。发现它总是加载默认的配置文件。
解决:
hadoop的配置文件hadoop-env.sh中加上export HADOOP_CLASSPATH=$HIVE_HOVE/conf:$HADOOP_CLASSPATH
还有一个问题:运行其他hadoop子项目时总会出现找不到类文件,必须把相关jar包拷贝到hadoop的lib目录下,导致其lib目录会越来越大。
至今不知道怎样将其他jar包加入到classpath中,网上说用export HADOOP_CLASSPATH=“”,但是好像不行
hive --config /root/etc/hive(注:/root/etc/hive是指hive-site.xml的存放目录)
HiveQL以分号结束。可以跨行。
在hive的shell上可以使用dfs命令执行HDFS文件操作。
dfs -ls /user/hive/warehouse;
hive语句必须以分号“;”结束。
不支持更新,索引和事务。
表名,列名不区分大小写。
在hive的shell上可以使用dfs命令执行HDFS文件的操作。dfs -ls /user/hive/warehouse/;
查看和设置临时变量:set fs.default.name[=hdfs://zhaoxiang:9000] ;
导入jar包: add jar hivejar.jar;
创建函数: create temporary function udfTest as 'com.cstore.udfExample';
【在pig中使用UDF,先用register语句注册jar文件,之后可以通过完全的java类名调用,或者用define语句为UDFding 指定一个名称:
register pigjar.jar;
define UPPER org.pigjar.string.UPPER();
B = foreach a generate UPPER($0); 】
可以在本地命令行运行hive的shell:
$ hive -e 'select * from userinfo' (执行hiveQL语句)
$ hive --config /hive-0.9.0/conf (重新载入新的配置文件)
$ hive --service hiveserver 50000(启动服务)
create table cite(citing int, cited int) row format delimited fields terminated by ',' stored as textfile; //sequencefle
load data (local) inpath 'cite75_99.txt' overwrite into table cite;//若不加local则默认为HDFS路径
select * from cite limit 10;
show tables;
describe cite;
select count(1)/count( ) from cite; //count(1)相当于SQL中的count( )
create table cite_count (cited int, count int);
insert overwrite table cite_count select cited , count(citing) from cite group by cited;
select * from cite_count where count 10 limit 10;
drop table cite_count;
create table page_view(viewTime int, userid bigint,
page_url string, referrer_url string,
ip string comment 'ip address of user')
comment 'this id the page view table'
partitioned by (dt string, country string)//注意table中的列不能和partition中的列重合
clustered by (userid) into 32 buckets //桶
row format delimited
fields terminated by ','
collection items terminated by '\002'
map keys terminated by '\003'
lines terminated by '\n'
stored as textfile;
//取样
select avg(viewTime) from page_view tablesample(bucket 1 out of 3 [on id]);
//创建外部表,指定目录位置,删除外部表时会删除元数据,表中的数据需手动删除
create external table page_view(viewTime int, userid bigint,
page_url string, referrer_url string,
ip string comment 'ip address of user')
location 'path/to/existing/table/in/HDFS';
//修改表
alter table page_view rename to pv;
alter table pv add columns (newcol string);
alter table pv drop partition (dt='2009-09-01');
show tables 'page_.*';
load data local inpath 'page_view.txt'
overwrite into table page_view
partition (dt='2009-09-01',country='US');
在hive的shell上执行unix命令:命令前加感叹号(!),命令尾加分号(;).
hive ! ls ;
hive ! head hive_result;
//执行查询:
insert overwrite table query-result; //eg: insert overwrite table query_result select * from page_view where country='US';
insert overwrite (local) directory '/hdfs-dir(local-dir)/query_result' query;
select country , count(distinct userid) from page_view group by countey;
//子查询,只能在from子句中出现子查询
select teacher, max(class-num) from
(select teacher, count(classname) as class-num from classinfo group by teacher)subquery
group by teacher;
//连接
select pv. , choice. , f.friends from page_view pv
join user u on (pv.userid=u.id)
join friends-list f on (u.id=f.uid);
//多表插入
create table mutil1 as select id, name from userinfo;
create table mutil2 like mutil1;
from userinfo insert overwrite table mutil1 select id, name
insert overwrite table mutil2 select count(distint id),name group by name;
//创建视图
create view teacher_classnum as select teacher, count(classname) from classinfo group by teacher;
关于javahive-e和的介绍到此就结束了,不知道你从中找到你需要的信息了吗 ?如果你还想了解更多这方面的信息,记得收藏关注本站。