转载自:http://stackoverflow.com/questions/13331722/how-to-sort-numerically-in-hadoops-shuffle-sort-phase
Assuming you are using Hadoop Streaming, you need to use the KeyFieldBasedComparator class.
- -D mapred.output.key.comparator.class=org.apach[......]
转载自:http://stackoverflow.com/questions/13331722/how-to-sort-numerically-in-hadoops-shuffle-sort-phase
Assuming you are using Hadoop Streaming, you need to use the KeyFieldBasedComparator class.
转载自:《Writing Hive Custom Aggregate Functions (UDAF): Part II》
Now that we got eclipse configured (see Part I) for UDAF development, its time to write our first UDAF. Searching for custom UDAF, most people might have already came across the followi[......]
在Hive中,在使用GenercU**F实现自定义UDF/UDAF/UDTF时,经常要制定输出类型,其中要获得一个ObjectInspector。
对于基础类型:
PrimitiveObjectInspectorFactory.javaStringObjectInspector)
对于List等复合类型,要2步:
ObjectInspectorFactory
.getStandardListObjectInspector(PrimitiveObjectInspectorFa[......]
在Hadoop中,常用的TextInputFormat是以换行符作为Record分隔符的。
在实际应用中,我们经常会出现一条Record中包含多行的情况,例如:
<doc>
....
</doc>
此时,需要拓展TextInputFormat以完成这个功能。
先来看一下原始实现:
public class TextInputFormat extends FileInputFormat<LongWritable, Text> {[......]
在Hive中,如果使用了External Table或者Partition,那么路径是不在自己的hive warehouse下的。
-- 获取table的真实hdfs路径
desc formatted my_table;
-- 获取partition的真实hdfs路径
desc formatted my_table (pt='20140804');
[......]