https://www.infoq.cn/article/fjebconxd2sz9wloykfo[......]
Category Archives: 大数据技术
Leave a reply
Flink双流join原理
https://developer.huawei.com/consumer/cn/forum/topic/0202775562683000448[......]
Presto获取当天、昨天的格式化日期字符串
format_datetime(current_date, 'YYYY-MM-dd'),
format_datetime(DATE_ADD('day', -1, current_date), 'YYYY-MM-dd')
format_datetime(DATE_ADD('day', -2, current_date), 'YYYY-MM-dd')
[......]
定制Hadoop的MapReduce任务的FileOutputFormat
需求:Reduce输出特殊的格式结果
例如:如Reducer的结果,压到Guava的BloomFilter中
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.h[......]
Hadoop如何快速完成数值排序的工作
转载自:http://stackoverflow.com/questions/13331722/how-to-sort-numerically-in-hadoops-shuffle-sort-phase
Assuming you are using Hadoop Streaming, you need to use the KeyFieldBasedComparator class.
- -D mapred.output.key.comparator.class=org.apach[......]