Bright and early this morning, Andrew Dunstan asked me how to number partitions. Oh, and it has to be fast!
We went back and forth a bit, and here's what we came up with.
Let's imagine we have a table like this:
CREATE TABLE empsalary(
depname varchar,
empno bigint,
salary int,
enroll_date date
);
Let's populate it with some data:
INSERT INTO empsalary
VALUES
('develop',10, 5200, '2007/08/01'),
('sales', 1, 5000, '2006/10/01'),
('personnel', 5, 3500, '2007/12/10'),
('sales', 4, 4800, '2007/08/08'),
('sales', 6, 5500, '2007/01/02'),
('personnel', 2, 3900, '2006/12/23'),
('develop', 7, 4200, '2008/01/01'),
('develop', 9, 4500, '2008/01/01'),
('sales', 3, 4800, '2007/08/01'),
('develop', 8, 6000, '2006/10/01'),
('develop', 11, 5200, '2007/08/15');
We'll write up a mini-report, ranking people by salary in their department:
SELECT depname, empno, salary
FROM empsalary
ORDER BY depname, salary DESC;
Not too complicated. Out of an overwhelming sense of order, we would like to
have those departments numbered. The obvious tool is to use windowing
functions, but how?
How about a sequence? Do this once per session:
CREATE TEMP SEQUENCE depno MINVALUE 0;
Next, initialize it before each time we do the query. No harm done setting it
twice.
SELECT setval('depno'::regclass,0);
Now, the fun part:
SELECT CASE WHEN row_number() OVER w = 1
THEN nextval('depno'::regclass)
ELSE currval('depno'::regclass)
END AS "depno",
depname, empno, salary
FROM
empsalary
WINDOW w AS (
PARTITION BY depname
ORDER BY salary DESC
)
ORDER BY depname, salary DESC; /* Just in case */
Tada!
depno | depname | empno | salary
-------+-----------+-------+--------
4 | develop | 8 | 6000
4 | develop | 10 | 5200
4 | develop | 11 | 5200
4 | develop | 9 | 4500
4 | develop | 7 | 4200
5 | personnel | 2 | 3900
5 | personnel | 5 | 3500
6 | sales | 6 | 5500
6 | sales | 1 | 5000
6 | sales | 3 | 4800
6 | sales | 4 | 4800
(11 rows)
[The next day, with comments from Regina et all...] The above is what happens when you post too late at night. I'd forgotten to reset the sequence. With the sequence reset, here's how it would look.
depno | depname | empno | salary
-------+-----------+-------+--------
1 | develop | 8 | 6000
1 | develop | 10 | 5200
1 | develop | 11 | 5200
1 | develop | 9 | 4500
1 | develop | 7 | 4200
2 | personnel | 2 | 3900
2 | personnel | 5 | 3500
3 | sales | 6 | 5500
3 | sales | 1 | 5000
3 | sales | 4 | 4800
3 | sales | 3 | 4800
And it turns out (thanks, commenters!) you don't actually need a sequence. We'll test both for
performance. Here's one way to do it in standard SQL.
SELECT dense_rank() OVER w AS "depno",
depname, empno, salary
FROM
empsalary
WINDOW w AS (
ORDER BY depname
)
ORDER BY depname, salary DESC;
Results as above.
so in other words, can you do same thing without sequence ?
SELECT dense_rank() OVER (ORDER BY depname) AS depno,
depname, empno, salary
FROM
empsalary
ORDER BY depname, salary DESC;
Also I'm puzzled why your numbering starts at 4.
I'll try that on a large data set and see how it works. Meanwhile, I've fixed the output, explained how I got it, and we'll test for performance.