MySQL中,Group By后,取每组的Top N条记录

转载自:http://www.geeklab.info/2012/11/mysql-how-to-get-top-n-rows-for-each-group/

We have the following table and data:

 

CREATE TEMPORARY TABLE girls(

name text,

haircolor text,

score INT

);

INSERT INTO girls VALUES ('Megan','brunette',9);

INSERT INTO girls VALUES ('Tiffany','brunette',5);

INSERT INTO girls VALUES ('Kimberly','brunette',7);

INSERT INTO girls VALUES ('Hester','blonde',10);

INSERT INTO girls VALUES ('Caroline','blonde',5);

SELECT * from girls;

+----------+-----------+-------+

| name | haircolor | score |

+----------+-----------+-------+

| Megan | brunette | 9 |

| Tiffany | brunette | 5 |

| Kimberly | brunette | 7 |

| Hester | blonde | 10 |

| Caroline | blonde | 5 |

+----------+-----------+-------+

5 rows in set (0.00 sec)

 

Session Variables

MySQL, at least on the versions I've checked, does not support ROW_NUMBER() function that can assign a sequence number within a group, the MySQL session variables can be used to build a workaround. Session variables do not need to be declared first and can be used to do calculations and perform actions based on them. They appear to require initialization. For instance:

@previous_haircolor := haircolor

This code is executed for each row and stores the value of haircolor column to @current_haircolor variable.

@girl_rank := IF(@previous_haircolor = haircolor, @girl_rank + 1, 1)

In this code, if @previous_haircolor equals the haircolor on the current row we increment rank, otherwise set it to 1. For the first row @previous_haircolor is NULL, so rank is also initialized to 1.

For correct ranking, we need to have ORDER BY haircolor,score DESC

So if we run:

SELECT @previous_haircolor := null; /* Initialize first, or girl_rank will always be 1 */

SELECT

name,haircolor,score,

@girl_rank := IF(@previous_haircolor = haircolor, @girl_rank + 1, 1) AS girl_rank,

@previous_haircolor := haircolor

FROM girls

ORDER BY haircolor,score DESC

We get the list of girls ranked by their score within the haircolor group:

+----------+-----------+-------+-----------+----------------------------------+

| name | haircolor | score | girl_rank | @previous_haircolor := haircolor |

+----------+-----------+-------+-----------+----------------------------------+

| Hester | blonde | 10 | 1 | blonde |

| Caroline | blonde | 5 | 2 | blonde |

| Megan | brunette | 9 | 1 | brunette |

| Kimberly | brunette | 7 | 2 | brunette |

| Tiffany | brunette | 5 | 3 | brunette |

+----------+-----------+-------+-----------+----------------------------------+

5 rows in set (0.00 sec)

When we have a rank assigned to each girl within her haircolor group, we can request the wanted range:

-- Get top 2 for each haircolor

SELECT name, haircolor, score

FROM (/*subquery above*/) ranked

WHERE girl_rank <= 2;

So the full query is:

SELECT @previous_haircolor := null; /* Initialize first, or girl_rank will always be 1 */

SELECT name, haircolor, score

FROM (SELECT

name,haircolor,score,

@girl_rank := IF(@previous_haircolor = haircolor, @girl_rank + 1, 1) AS girl_rank,

@previous_haircolor := haircolor

FROM girls

ORDER BY haircolor,score DESC) ranked

WHERE girl_rank <= 2;

Using oracle, SQL Server and PostgreSQL: ROW_NUMBER()

 

In Oracle, SQL Server and PostgreSQL (version 8.4 and higher) you can achieve the same functionality using ROW_NUMBER function:

SELECT name,haircolor,score

FROM

(SELECT name,haircolor,score

ROW_NUMBER() OVER (PARTITION BY haircolor ORDER BY score DESC) as girl_rank

FROM girls) ranked

WHERE girl_rank <= 2;

Leave a Reply

Your email address will not be published. Required fields are marked *