Thursday, March 14, 2013

How to calculate the grouped dense rank function in mysql sql queries?

Mysql database does not have the built-in support for dense rank functionality. Databases like oracle, Netezza supports the dense rank analytical function. In this post, we will see how to get the dense rank values. Consider the following employees table records as an example:

dept_id, salary
10, 10000
10, 20000
10, 20000
10, 30000
20, 50000
20, 60000

The requirement is to group the department id and then generate the dense rank values on salary in ascending order. The output should look as

dept_id, salary, denseRank
10, 10000, 1
10, 20000, 2
10, 20000, 2
10, 30000, 4
20, 50000, 1
20, 60000, 2

1. For those who are familiar with oracle, I am providing the sql query to generate dense rank values in oracle database.

SELECT dept_id,
       dense_rank() over (
            partition by dept_id
            order by salary
            ) RowNumber
FROM  emp;

2. Let see how to calculate the dense rank values in mysql sql queries. Mysql supports using of variables in sql queries. Now we use the variable support to generate the dense rank values. The following query generates the dense rank values grouping on dept_id and sorting on salary:

set @pk1 ='';
set @rn1 =1;
set @sal ='';
set @val =1;

SELECT  dept_id,
  SELECT  dept_id,
          @rn1 := if(@pk1=dept_id, if(@sal=salary, @rn1, @rn1+@val),1) as denseRank,
          @val := if(@pk1=dept_id, if(@sal=salary, @val+1, 1),1) as value,
          @pk1 := dept_id,
          @sal := salary     
    SELECT  dept_id,
    FROM    emp
    ORDER BY dept_id,salary
) A
) B;

Understanding the Dense Rank Values Mysql Sql Query:
  • Declare four variables using the set operator. The pk1 variable is to track the changes in department id, sal variable is to track the changes in salary column, val variable is count number of times the salary repeats and the ran1 variable is to calculate the rank.
  • Sort the rows on the grouped columns and then the ranking columns. Here grouped column is dept_id and ranking column is salary. This order by should be done in inner query.
  • In the outer query, compare whether the current row department id with previous row department id. If the department id is newer one or changes, then assign rn1 with 1, otherwise compare the previous row salary with current row salary. If the salary changes, then increment the rank by the number of times the previous salary repeated. Otherwise assign the same rank value. The val variable stores how many times the salary value is repeated. The previous department id value is stored in the pk1 variable and the previous row salary is stored in sal variable.
  • Finally select the required columns in the outermost query.

No comments:

Post a Comment