This was first published on https://blog.dbi-services.com/oracle-tuning-silver-bullet-add-an-order-by-to-make-your-query-faster (2015-04-01)
Republishing here for new followers. The content is related to the the versions available at the publication date
You have read all Cary Millsap work about Oracle database performance tuning. You know that there are no silver bullets. Reducing the response time requires a methodical approach in order to analyze the response time with the goal of eliminating all unnecessary work. But I’ll show something completly opposite here. A performance tuning silver bullet. Do more work in order to run it faster: just add an ORDER BY to your query and its faster. I’ve made a very simple test case in order to let you test it by yourself. I set my sqlplus environement and connect remotely to ma database:
SQL> set pagesize 1000 linesize 100 echo on timing on SQL> connect xxx/xxx@//xxx/xxx Connected.
I create a DEMO table which is very simple: 10000 rows of small column.
SQL> create table DEMO pctfree 0 as select mod(rownum,2) x from dual connect by 10000>=level; Table created. Elapsed: 00:00:00.03
And a run a very simple query which does a full scan of that small table and pads the output to make it a significant size:
SQL> set autotrace trace SQL> select lpad(x,2000,x) from DEMO; 10000 rows selected. Elapsed: 00:00:21.52
It took 22 seconds to execute that. I’ve run it with ‘autotrace trace’ so that I’ve not the overhead of displaying all rows from sqlplus. Rows are fetched but not displayed. I measure only the time to get the result. Let’s run it again to be sure there is no cache effect:
SQL> select lpad(x,2000,x) from DEMO; 10000 rows selected. Elapsed: 00:00:25.61
It is still taking more than 20 seconds to retreive the 10000 rows. As I am with autotrace, I get the execution plan and basic statistics:
Execution Plan ---------------------------------------------------------- Plan hash value: 4000794843 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 126K| 4 (0)| 00:00:01 | | 1 | TABLE ACCESS FULL| DEMO | 10000 | 126K| 4 (0)| 00:00:01 | -------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 683 consistent gets 0 physical reads 0 redo size 20163693 bytes sent via SQL*Net to client 7818 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
You can’t do more simple: a full table scan retreiving all the rows.
And now, let’s apply my silver bullet: add an ORDER BY at the end and run it again:
SQL> a order by x 1* select lpad(x,2000,x) from DEMO order by x SQL> / 10000 rows selected. Elapsed: 00:00:07.06
Here it is. The response time is now less than 10 seconds. 2x faster. If you don’t believe it, just test it. And thanks to autotrace I can check that I’m doing the same full table scan, but with an additional sort:
Execution Plan ---------------------------------------------------------- Plan hash value: 903288357 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 126K| 5 (20)| 00:00:01 | | 1 | SORT ORDER BY | | 10000 | 126K| 5 (20)| 00:00:01 | | 2 | TABLE ACCESS FULL| DEMO | 10000 | 126K| 4 (0)| 00:00:01 | --------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 4 recursive calls 0 db block gets 35 consistent gets 0 physical reads 0 redo size 118324 bytes sent via SQL*Net to client 7818 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 10000 rows processed
The plan clearly show that I’m doing additional work: Oracle has to sort the rows before sending them. And the statistics show that I’m still returning 10000 rows and have 1 sort done. Adding an order by made this query faster…
Ok, at that point you probably realize that this post is clearly the opposite of the tuning method I usually recommend, and you have checked today’s date. Please don’t hesitate to comment if you have an idea about the reasons (plural because there are at least two of them) why the second query is much faster. If you attended my tuning workshop you should already know the main reason. If you are a long time reader of Jonathan Lewis blog then you should also get the second one.
All good answers are in the comments. Of course something was wrong in the introduction. We didn’t introduce additional work with the SORT operation, but this is minimal. And there were unnecessary work before (the roundtrips) that had a bigger overhead. Besides the April Fool joke, the goal of this post was to show that it’s not only important to make a query faster. You must know why you made it faster, or maybe you just hide a problem that will come back again. And that’s not too difficult if you use the right tools. Here, a simple autotrace showing only 11 session statistics had all necessary information.
almost had me One reason : Fewer bytes transferred because of compression at SQL*Net level? think Tom and Jonathan have blogged about this. Not sure why fewer logical IOs.
Excellent site.
Hi Jim, You identified quickly the point that is probably the less known. And the symptom of the second point (LIO). Congratulations. I’m sure root cause of second point will come very quickly. Regards, Franck.
for the sake of completeness the Link to Jonathan’s article: https://jonathanlewis.wordpress.com/2010/05/07/sqlnet-compression/
Thanks Martin for the link. This is why I selected lpad(x,2000,x) in order to have some volume. But there is still the fact that the number of LIO is higher without the order by. You can compare: select rownum from DEMO; with: select rownum from DEMO order by 1;
I think your arrysize is 15 (10000 rows, 668 roundtrips; not a surprise in sqlplus). For the query without the order by in each fetch the buffer that has been used in the last fetch has to be reaccessed – resulting in a CR. For the ordered query the complete table data has been read in a single fetch and passed to the PGA in which the sort operation took place – then the sorted result was send to the client (and that’s not a CR).
Well noone else is biting, so here goes … Small sqlplus arraysize + lots of rows per block => each fetch can revisit block from previous fetch. Not 100% sure why sorting data first minimises effect of small arraysize, though. Might guess at blocking pinning or result set caching but suspect will just embarrass myself on April Fool’s.
Hi Martin and Jim, your’re right. Default fetch size is too small on most clients and because buffer cannot be pinned across user calls it has to revisit with the cost of logical read. I’ve created the table with very short rows to amplify the effect: lot of sets of 15 rows can fit in one block. Sorting buffers that in PGA – which is not shared and doesn’t need all that logical read overhead we have on a shared structure. As far as I know there is no way to do the same buffering without a SORT, which could help when we cannot change the application fetch size.
My guess: the order by forces Oracle to ignore arraysize and fetch the entire set in order to sort it. Without the order by the SQL*Plus default arraysize will have an impact.
Hi Neil, Arraysize is not ignored, It’s still the size of fetch. But it’s fetching from a buffer in PGA rather on shared buffers in SGA – so no additional logical reads.
I have not worked with Oracle for a while, but this highlights a gap in my knowledge/understanding- can anyone enlighten me? If session’s PGA is too small and data is spilled to disk (temp), then will number of [direct] IOs to/from TEMP show up in any autotrace/dbms_xplan metrics [other than sorts (disk) counter] or AWR report? Real-Time SQL Monitoring displays both memory and temp usage, so metrics available – but not sure if measured at sqlid/session level/parallel slave level.
Hi Jim, Those reads will show up in session statistics as ‘physical reads direct temporary tablespace’ which are included into ‘physical reads direct’ which are included into ‘physical reads’ and they will match the related writes (‘physical writes direct temporary tablespace’). And you will see them as DISK_READS in V$SQL. Those are in number of blocks. Regards, Franck.