This was first published on https://blog.dbi-services.com/oracle-rownum-vs-rownumber-and-12c-fetch-first (2014-05-05)
Republishing here for new followers. The content is related to the the versions available at the publication date
Prior to Oracle 12c, there were two ways to do ‘top-n’ queries:
Top-n queries are usually required for result pagination. The application shows the first 10 rows, then the next 10 on the following page, etc. In these queries, user interactions are often stateless, so we cannot just open a cursor and fetch 10 rows on each user request.
In both case you will need a subquery. In the former case you need it because otherwise the order by will be applied after the rownum filter. In the latter case you need it because you cannot use analytic functions in the where clause.
But there is a big difference, which is very important to get the optimal plan, and this is what I will expose here before showing the new 12c syntax.
PLAN_TABLE_OUTPUT SQL_ID 7x2wat0fhwdn9, child number 0 ------------------------------------- select * from ( select * from test where contract_id=500 order by start_validity ) where rownum <=10 order by start_validity Plan hash value: 2207676858 -------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 10 | 14 | |* 1 | COUNT STOPKEY | | 1 | | 10 | 14 | | 2 | VIEW | | 1 | 10 | 10 | 14 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10 | 10 | 14 | |* 4 | INDEX RANGE SCAN | TEST_PK | 1 | | 16 | 4 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=10) 4 - access("CONTRACT_ID"=500)This is good. The optimizer knows that we retrieve only 10 rows and the index access was chosen for that because it returns rows in the right order without the need to sort them.
PLAN_TABLE_OUTPUT SQL_ID gmmz59f64zftf, child number 0 ------------------------------------- select * from ( select test.*,row_number()over(order by start_validity) rn from test where contract_id=500 ) where rn <=10 order by start_validity Plan hash value: 1795822849 ------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 10 | 4397 | |* 1 | VIEW | | 1 | 10 | 10 | 4397 | |* 2 | WINDOW SORT PUSHED RANK| | 1 | 2000 | 10 | 4397 | |* 3 | TABLE ACCESS FULL | TEST | 1 | 2000 | 2000 | 4397 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "START_VALIDITY")<=10) 3 - filter("CONTRACT_ID"=500)This is not good. A full table scan reads all rows, whereas an index can give those 10 rows quickly. Reading and sorting thousands of rows instead of just reading a dozen. So why the index scan was not chosen? In that case I usually force the plan to the one that I expect, so that I can see which bad estimation has driven the CBO choice. Let’s see the execution plan when forcing range scan :
PLAN_TABLE_OUTPUT SQL_ID axxrwar6s9275, child number 0 ------------------------------------- select * from ( select /*+ index(TEST) */ test.*,row_number()over(order by start_validity) rn from test where contract_id=500 ) where rn<=10 order by start_validity Plan hash value: 1912639229 -------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 10 | 15 | |* 1 | VIEW | | 1 | 10 | 10 | 15 | |* 2 | WINDOW NOSORT STOPKEY | | 1 | 2000 | 10 | 15 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 2000 | 11 | 15 | |* 4 | INDEX RANGE SCAN | TEST_PK | 1 | 2000 | 11 | 4 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "START_VALIDITY")<=10) 4 - access("CONTRACT_ID"=500)I didn’t show the cost here. The estimated number of rows is sufficient to understand. The optimizer has estimated the cost of the index access without knowing that we retreive at most 10 rows. When using rownum < 10, the optimizer implicitly add a first_rows(10). But not with row_number().
With the row_number() we have to do it ourselves:
PLAN_TABLE_OUTPUT SQL_ID 32m4yadk1ypsn, child number 0 ------------------------------------- select /*+ FIRST_ROWS(10) */ * from ( select test.*,row_number()over(order by start_validity) rn from test where contract_id=500 ) where rn<=10 order by start_validity Plan hash value: 1912639229 -------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 10 | 15 | |* 1 | VIEW | | 1 | 10 | 10 | 15 | |* 2 | WINDOW NOSORT STOPKEY | | 1 | 10 | 10 | 15 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10 | 11 | 15 | |* 4 | INDEX RANGE SCAN | TEST_PK | 1 | | 11 | 4 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RN"<=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "START_VALIDITY")<=10) 4 - access("CONTRACT_ID"=500)Now the index range scan is chosen, with the right cardinality estimation.
So which solution it the best one? I prefer row_number() for several reasons:
As you can see, Oracle does not say that rownum is de-supported. Just that there is another function with ‘superior support’. Which is the same I presume.
So, I use row_number() in 11g or the ANSI syntax in 12c but in both cases I have to add the FIRST_ROWS(n) hint in orde rto get the right plan. Warning: don’t use the old FIRST_ROWS hint anymore which was rule based and is deprecated. The right way to tell Oracle that you will fetch only n rows is the FIRST_ROWS(n) hint.
Now here is the same query with the 12c syntax.
PLAN_TABLE_OUTPUT SQL_ID 49m5a3f33cmd0, child number 0 ------------------------------------- select /*+ FIRST_ROWS(10) */ * from test where contract_id=500 order by start_validity fetch first 10 rows only Plan hash value: 1912639229 -------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 10 | 15 | |* 1 | VIEW | | 1 | 10 | 10 | 15 | |* 2 | WINDOW NOSORT STOPKEY | | 1 | 10 | 10 | 15 | | 3 | TABLE ACCESS BY INDEX ROWID| TEST | 1 | 10 | 11 | 15 | |* 4 | INDEX RANGE SCAN | TEST_PK | 1 | | 11 | 4 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber" <=10) 2 - filter(ROW_NUMBER() OVER ( ORDER BY "TEST"."START_VALIDITY") <=10 ) 4 - access("CONTRACT_ID"=500)The 12c syntax has been translated with row_number() predicates and I added the FIRST_ROWS(10) hint in order to have the right estimations.
Well written and nicely condensed overview!
Would you recommend to set the hint also when you use row_number with a partioned by object? The problem there is that you can’t reasonbly say how many rows will be returned. Only that it will be a specific number per partioning group.
example
select * from ( select test.*,row_number()over(partition by contract_id order by start_validity) rn from test ) where rn =1 order by start_validity
Hi Sven, Thanks for your feedback. Good question. Here, the rownum is there to filter rows but not to stop after n rows (no STOPKEY in the plan). So you should not use FIRST_ROWS hint. I’ve quickly check the execution plan with a first_rows(10) and it’s very bad. Some operations cost for 10 rows, some others for all rows. Big inconsistencies. I put that in my blog todo list Regards, Franck.
Hi, thanks for your post. I’m just not sure, what is the purpose of 2nd ORDER BY expression in your 1st example query:
select * from ( select * from test where contract_id=500 order by start_validity ) where rownum <=10 order by start_validity
Hi Tarquinius, Yes you’re right, there is no reason for it. I think it’s there because it was there in the customer query for the problem I reproduced here. Regards, Franck.