Oracle tuning silver bullet: add an order by to make your query faster

You have read all Cary Millsap work about Oracle database performance tuning. You know that there are no silver bullets. Reducing the response time requires a methodical approach in order to analyze the response time with the goal of eliminating all unnecessary work. But I’ll show something completly opposite here. A performance tuning silver bullet. Do more work in order to run it faster: just add an ORDER BY to your query and its faster. I’ve made a very simple test case in order to let you test it by yourself. I set my sqlplus environement and connect remotely to ma database:

SQL> set pagesize 1000 linesize 100 echo on timing on
SQL> connect xxx/xxx@//xxx/xxx
Connected.

I create a DEMO table which is very simple: 10000 rows of small column.

SQL> create table DEMO pctfree 0 as select mod(rownum,2) x from dual connect by 10000>=level;
Table created.
Elapsed: 00:00:00.03

And a run a very simple query which does a full scan of that small table and pads the output to make it a significant size:

SQL> set autotrace trace
SQL> select lpad(x,2000,x) from DEMO;
10000 rows selected.

Elapsed: 00:00:21.52

It took 22 seconds to execute that. I’ve run it with ‘autotrace trace’ so that I’ve not the overhead of displaying all rows from sqlplus. Rows are fetched but not displayed. I measure only the time to get the result. Let’s run it again to be sure there is no cache effect:

SQL> select lpad(x,2000,x) from DEMO;
10000 rows selected.

Elapsed: 00:00:25.61

It is still taking more than 20 seconds to retreive the 10000 rows. As I am with autotrace, I get the execution plan and basic statistics:

Execution Plan
----------------------------------------------------------
Plan hash value: 4000794843

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DEMO | 10000 |   126K|     4   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        683  consistent gets
          0  physical reads
          0  redo size
   20163693  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

You can’t do more simple: a full table scan retreiving all the rows.

order by

And now, let’s apply my silver bullet: add an ORDER BY at the end and run it again:

SQL> a  order by x
  1* select lpad(x,2000,x) from DEMO order by x
SQL> /
10000 rows selected.

Elapsed: 00:00:07.06

Here it is. The response time is now less than 10 seconds. 2x faster. If you don’t believe it, just test it. And thanks to autotrace I can check that I’m doing the same full table scan, but with an additional sort:

Execution Plan
----------------------------------------------------------
Plan hash value: 903288357

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   126K|     5  (20)| 00:00:01 |
|   1 |  SORT ORDER BY     |      | 10000 |   126K|     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEMO | 10000 |   126K|     4   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         35  consistent gets
          0  physical reads
          0  redo size
     118324  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The plan clearly show that I’m doing additional work: Oracle has to sort the rows before sending them. And the statistics show that I’m still returning 10000 rows and have 1 sort done. Adding an order by made this query faster…

Ok, at that point you probably realize that this post is clearly the opposite of the tuning method I usually recommend, and you have checked today’s date. Please don’t hesitate to comment if you have an idea about the reasons (plural because there are at least two of them) why the second query is much faster. If you attended my tuning workshop you should already know the main reason. If you are a long time reader of Jonathan Lewis blog then you should also get the second one.

Update 2nd April 2015

All good answers are in the comments. Of course something was wrong in the introduction. We didn’t introduce additional work with the SORT operation, but this is minimal. And there were unnecessary work before (the roundtrips) that had a bigger overhead. Besides the April Fool joke, the goal of this post was to show that it’s not only important to make a query faster. You must know why you made it faster, or maybe you just hide a problem that will come back again. And that’s not too difficult if you use the right tools. Here, a simple autotrace showing only 11 session statistics had all necessary information.

11 Comments

Jim Dickson (dev DBA) says:

April 1, 2015 at 7 h 27 min

almost had me One reason : Fewer bytes transferred because of compression at SQL*Net level? think Tom and Jonathan have blogged about this. Not sure why fewer logical IOs.

Excellent site.

Reply to Jim
Franck Pachot says:

April 1, 2015 at 7 h 30 min

Hi Jim, You identified quickly the point that is probably the less known. And the symptom of the second point (LIO). Congratulations. I’m sure root cause of second point will come very quickly. Regards, Franck.

Reply to Franck
Martin Preiss says:

April 1, 2015 at 8 h 16 min

for the sake of completeness the Link to Jonathan’s article: https://jonathanlewis.wordpress.com/2010/05/07/sqlnet-compression/

Reply to Martin
Franck Pachot says:

April 1, 2015 at 8 h 24 min

Thanks Martin for the link. This is why I selected lpad(x,2000,x) in order to have some volume. But there is still the fact that the number of LIO is higher without the order by. You can compare: select rownum from DEMO; with: select rownum from DEMO order by 1;

Reply to Franck
Martin Preiss says:

April 1, 2015 at 10 h 03 min

I think your arrysize is 15 (10000 rows, 668 roundtrips; not a surprise in sqlplus). For the query without the order by in each fetch the buffer that has been used in the last fetch has to be reaccessed – resulting in a CR. For the ordered query the complete table data has been read in a single fetch and passed to the PGA in which the sort operation took place – then the sorted result was send to the client (and that’s not a CR).

Reply to Martin
Jim Dickson says:

April 1, 2015 at 10 h 07 min

Well noone else is biting, so here goes … Small sqlplus arraysize + lots of rows per block => each fetch can revisit block from previous fetch. Not 100% sure why sorting data first minimises effect of small arraysize, though. Might guess at blocking pinning or result set caching but suspect will just embarrass myself on April Fool’s.

Reply to Jim
Franck Pachot says:

April 1, 2015 at 10 h 14 min

Hi Martin and Jim, your’re right. Default fetch size is too small on most clients and because buffer cannot be pinned across user calls it has to revisit with the cost of logical read. I’ve created the table with very short rows to amplify the effect: lot of sets of 15 rows can fit in one block. Sorting buffers that in PGA – which is not shared and doesn’t need all that logical read overhead we have on a shared structure. As far as I know there is no way to do the same buffering without a SORT, which could help when we cannot change the application fetch size.

Reply to Franck
Neil Johnson says:

April 1, 2015 at 14 h 43 min

My guess: the order by forces Oracle to ignore arraysize and fetch the entire set in order to sort it. Without the order by the SQL*Plus default arraysize will have an impact.

Reply to Neil
Franck Pachot says:

April 1, 2015 at 15 h 41 min

Hi Neil, Arraysize is not ignored, It’s still the size of fetch. But it’s fetching from a buffer in PGA rather on shared buffers in SGA – so no additional logical reads.

Reply to Franck
Jim Dickson says:

April 2, 2015 at 8 h 35 min

I have not worked with Oracle for a while, but this highlights a gap in my knowledge/understanding- can anyone enlighten me? If session’s PGA is too small and data is spilled to disk (temp), then will number of [direct] IOs to/from TEMP show up in any autotrace/dbms_xplan metrics [other than sorts (disk) counter] or AWR report? Real-Time SQL Monitoring displays both memory and temp usage, so metrics available – but not sure if measured at sqlid/session level/parallel slave level.

Reply to Jim
Franck Pachot says:

April 2, 2015 at 8 h 50 min

Hi Jim, Those reads will show up in session statistics as ‘physical reads direct temporary tablespace’ which are included into ‘physical reads direct’ which are included into ‘physical reads’ and they will match the related writes (‘physical writes direct temporary tablespace’). And you will see them as DISK_READS in V$SQL. Those are in number of blocks. Regards, Franck.

Reply to Franck

Follow: Linkedin, Twitter, Youtube, Mastodon, dev.to