Tag Archives: Group By

Hive中实现Group By后,取Top K条记录

RT,在Hive中,使用了Group By后,是无法再sort,再取Top K的,我们可以用UDF + distributed by + sort by 实现这个功能。

参考自:EXTRACT TOP N RECORDS IN EACH GROUP IN HADOOP/HIVE

Assume you have a table with three columns: user, category and value. For each user, you want to select[……]

继续阅读