Adaptive Plan: How much can STATISTICS COLLECTOR buffer?

The 12c adaptive plan prepares two join methods (Hash Join and Nested Loop), actives the one that has the better cost for the estimated cardinality and computes the point of inflection in cardinality estimation where the best cost changes to the other join method. At execution time, rows are buffered by a STATISTICS COLLECTOR operation in order to see if the point of inflection is reached. If it doesn’t, the plan continues as planned. If it does, the alternative join method is activated. But buffering has a limit…

@ChrisAntognini @FranckPachot True, Chris. Do you know the internal algorithm or an underscore parameter that controls the buffering limit?

— Stefan Koehler (@OracleSK) May 3, 2016

Let’s try to find this limit empirically.

I create a table with enough rows:

SQL> create table demo1 (n constraint demo1pk primary key,x1) as select 0 , cast('x' as varchar2(4000)) from dual;
Table created.
SQL> insert --+ append
    into demo1 select 1e7+rownum ,'x' from xmltable('1 to 200000');
200000 rows created.

and a second table to join:

SQL> create table demo2 (n constraint demo2pk primary key,x2) as select 0 , 'x' from dual;
Table created.

I filled the DEMO1 table in two steps. First, CTAS with one row so that the statistics (online statistics gathering) favors nested loops. And I inserted lot of rows later because I want to fill the Adaptive Plan buffer. DEMO2 is a small table but I want the FULL TABLE SCAN on it to be a bit more expensive or hash join will be always choosen. I do that by faking the number of blocks:

SQL> exec dbms_stats.set_table_stats(user,'DEMO2',numblks=3000,no_invalidate=>false);
PL/SQL procedure successfully completed.

If I check the execution plan I see that NESTED LOOP is chosen because estimated number of rows is small (artificially set to 1 row):

SQL> explain plan for
  2  select max(x1),max(x2) from demo1 join demo2 using(n);

Explained.

SQL> select * from table(dbms_xplan.display(format=>'+adaptive'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 604214593

--------------------------------------------------------------------------------------------
|   Id  | Operation                      | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT               |         |     1 |     8 |     4   (0)| 00:00:01 |
|     1 |  SORT AGGREGATE                |         |     1 |     8 |            |          |
|- *  2 |   HASH JOIN                    |         |     1 |     8 |     4   (0)| 00:00:01 |
|     3 |    NESTED LOOPS                |         |     1 |     8 |     4   (0)| 00:00:01 |
|     4 |     NESTED LOOPS               |         |     1 |     8 |     4   (0)| 00:00:01 |
|-    5 |      STATISTICS COLLECTOR      |         |       |       |            |          |
|     6 |       TABLE ACCESS FULL        | DEMO1   |     1 |     4 |     3   (0)| 00:00:01 |
|  *  7 |      INDEX UNIQUE SCAN         | DEMO2PK |     1 |       |     0   (0)| 00:00:01 |
|     8 |     TABLE ACCESS BY INDEX ROWID| DEMO2   |     1 |     4 |     1   (0)| 00:00:01 |
|-    9 |    TABLE ACCESS FULL           | DEMO2   |     1 |     4 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEMO1"."N"="DEMO2"."N")
   7 - access("DEMO1"."N"="DEMO2"."N")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

But the plan is adaptive and can switch to HASH JOIN of more rows than expected are encountered by STATISTICS COLLECTOR.

I run it and gather run time statistics

SQL> alter session set statistics_level=all;
Session altered.
SQL> select max(x1),max(x2) from demo1 join demo2 using(n);
MAX(X1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
M
-
x
x

And here is the adaptive plan: Hash Join is activated because we have actually lot of rows (200000):

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d2y436sr1cx3r, child number 0
-------------------------------------
select max(x1),max(x2) from demo1 join demo2 using(n)

Plan hash value: 740165205

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |      1 |        |      1 |00:00:02.04 |     372 |    423 |    483 |       |       |          |         |
|   1 |  SORT AGGREGATE     |       |      1 |      1 |      1 |00:00:02.04 |     372 |    423 |    483 |       |       |          |         |
|*  2 |   HASH JOIN         |       |      1 |      1 |      1 |00:00:02.04 |     372 |    423 |    483 |    11M|  4521K| 1262K (1)|    4096 |
|   3 |    TABLE ACCESS FULL| DEMO1 |      1 |      1 |    200K|00:00:00.20 |     369 |    360 |      0 |       |       |          |         |
|   4 |    TABLE ACCESS FULL| DEMO2 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DEMO1"."N"="DEMO2"."N")
   7 - access("DEMO1"."N"="DEMO2"."N")

Note
-----
   - this is an adaptive plan

The point of inflection is 814:

SQL> column tracefile new_value tracefile
SQL> alter session set tracefile_identifier='cbo_trace';
Session altered.

SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));

TRACEFILE
----------------------------------------------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ora_3979_cbo_trace.trc

SQL> host > &tracefile.

SQL> exec dbms_sqldiag.dump_trace(p_sql_id=>'d2y436sr1cx3r',p_child_number=>0,p_component=>'Compiler',p_file_id=>'');
PL/SQL procedure successfully completed.

SQL> host grep -E "^DP" &tracefile. | tail
DP - distinct placement
DP: Found point of inflection for NLJ vs. HJ: card = 814.00

Filling the buffer

So here, 814 rows were buffered and the plan switched to HASH JOIN. I want to know how many rows can be buffered, so I want to increase the point of inflection. Easy, if the cost of DEMO2 full table scan is higher then the NESTED LOOP will be cheaper than HASH JOIN even with more rows. Let’s fake the DEMO2 statistics to show a larger table:

SQL> exec dbms_stats.set_table_stats(user,'DEMO2',numblks=4000,no_invalidate=>false);
PL/SQL procedure successfully completed.

And let’s run that again:

SQL> select max(x1),max(x2) from demo1 join demo2 using(n);

MAX(X1)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
M
-
x
x

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d2y436sr1cx3r, child number 0
-------------------------------------
select max(x1),max(x2) from demo1 join demo2 using(n)

Plan hash value: 604214593

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |      1 |00:00:01.57 |     374 |    360 |
|   1 |  SORT AGGREGATE               |         |      1 |      1 |      1 |00:00:01.57 |     374 |    360 |
|   2 |   NESTED LOOPS                |         |      1 |      1 |      1 |00:00:01.57 |     374 |    360 |
|   3 |    NESTED LOOPS               |         |      1 |      1 |      1 |00:00:01.57 |     373 |    360 |
|   4 |     TABLE ACCESS FULL         | DEMO1   |      1 |      1 |    200K|00:00:00.20 |     369 |    360 |
|*  5 |     INDEX UNIQUE SCAN         | DEMO2PK |    200K|      1 |      1 |00:00:00.42 |       4 |      0 |
|   6 |    TABLE ACCESS BY INDEX ROWID| DEMO2   |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DEMO1"."N"="DEMO2"."N")

Note
-----
   - this is an adaptive plan

27 rows selected.

SQL> column tracefile new_value tracefile
SQL> alter session set tracefile_identifier='cbo_trace';
Session altered.

SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));

TRACEFILE
----------------------------------------------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ora_4083_cbo_trace.trc

SQL> host > &tracefile.
SQL> exec dbms_sqldiag.dump_trace(p_sql_id=>'d2y436sr1cx3r',p_child_number=>0,p_component=>'Compiler',p_file_id=>'');
PL/SQL procedure successfully completed.

SQL> host grep -E "^DP" &tracefile. | tail
DP - distinct placement
DP: Found point of inflection for NLJ vs. HJ: card = 1086.00

Read it from the end:

The inflection point is higher: 1086, which was my goal
The number of rows from DEMO1 is still 200000 rows, so it’s higher than the inflection point.
We expect a HASH JOIN because the number of rows is higher than the inflection point
But the plan stayed with NESTED LOOP because the buffering in STATISTICS COLLECTOR never reached the inflection point

Dichotomy

By Dichotomy, ‘JOIN’


NESTED NESTED NESTED NESTED NESTED NESTED NESTED NESTED NESTED HASH HASH HASH HASH HASH NESTED HASH HASH NESTED HASH HASH NESTED

I’ve For the moment, I’ve scripted similar tests to find the point where reaching the point of inflection do not trigger a plan switch. is the method chosen (from dbms_xplan.display_cursor after execution), ‘INFLECTION POINT’ is the one gathered from 10053 trace and ‘STATBLKS’ is the numblks I set for DEMO2 in order to vary the point of inflection. JOIN INFLECTION POINT HASH_AREA_SIZE BUFFER STATBLKS LPAD 271889 1000000 2175117 1000000 1 135823 1000000 1086590 500000 1 67789 1000000 542319 250000 1 33885 1000000 271087 125000 1 16943 1000000 135551 62500 1 8471 1000000 67775 31250 1 4238 1000000 33904 15625 1 2120 1000000 16960 7813 1 1060 1000000 8480 3907 1 532 1000000 4256 1954 1 796 1000000 6368 2930 1 928 1000000 7424 3418 1 994 1000000 7952 3662 1 1026 1000000 8208 3784 1 1044 1000000 8352 3845 1 1036 1000000 8288 3814 1 1040 1000000 8320 3829 1 1042 1000000 8336 3837 1 1040 1000000 8320 3833 1 1040 1000000 8320 3835 1 1042 1000000 8336 3836 1 added some variations on hash_area_size (my bad guess that it makes sense to buffer up to that amount because this is what will go to hash area size at least, if hash join is finally chosen) and on the DEMO1 row size (by varying an lpad on column X). when point of inflection is less than 1041 a plan switch occurs and when it is higher than 1042 no plan switch occurs.

But there are probably other parameters influencing because:

@FranckPachot @OracleSK I just modified/run an example I have and I can confirm to see a switch with an inflection point at 21216.

— Christian Antognini (@ChrisAntognini) May 4, 2016

Any idea welcome…

Update 30 mins. later

Thanks to Chris Antognini. It appears the parameter that influences the number of rows buffered is not the actual size of the row. The number of rows to buffer is calculated from the theoretical size of the columns. Which is very bad in my opinion given the number of applications that declares column size at their maximum. And I see no reason why this has to be set like that. Rows are of variable size and allocating buffers on column definition is not a good idea. That reminds jdbc fetch size very well described by Sigrid Keydana.

Here is the limit for different size of the varchar2:

                JOIN     INFLECTION POINT       HASH_AREA_SIZE          BUFFER SIZE             STATBLKS         VARCHAR SIZE
              NESTED               155345                65536              3572954               572641                    1
              NESTED                74899                65536              3894796               276092                   30
              NESTED                63551                65536              3940220               234258                   40
              NESTED                33289                65536              4061376               122802                  100
              NESTED                12865                65536              4142848                47453                  300
              NESTED                 7973                65536              4162422                29409                  500
              NESTED                 4090                65536              4179980                15079                 1000
              NESTED                 2072                65536              4189584                 7635                 2000
              NESTED                 1388                65536              4194536                 5110                 3000

So it looks like oracle allocates a buffer of about few MB, calculates how many rows can fit there given their column definition, and limits the buffering to that number of rows. The nonsense in my opinion is that size calculated from column definition can be calculated at parse time, when the point of inflection is determined. It makes no sense to set a point of inflection higher than the number of rows that can be buffered.

7 Comments

Chris Antognini says:

May 4, 2016 at 20 h 23 min

Salut Franck

Try replace the VARCHAR2(4000) with something much smaller…

HTH Chris

Reply to Chris
Nicolas Jardot says:

May 5, 2016 at 12 h 01 min

Just a question that brings me to a quick guess: where the buffer is stored? I mean it’s in a memory pool somewhere so my guess is that buffer size may depends on stuff like NUM_CPU or the size of another memory pool. If you test on a small machine you probably get a smaller buffer and less rows fit in the buffer.

Reply to Nicolas
Chris Antognini says:

May 5, 2016 at 15 h 54 min

Salut Franck

This comment refers to the “Update 30 mins later” part. I wrote it because, I must admit, I don’t agree with some of your opinions. And, first of all, apologize if I’m prolix. But in that way I hope that the readers of your blog get a different point of view when they read this text.

Let me start by making clear that the query optimizer and the execution engine are two separate components. Simply put, the query optimizer is responsible for generating optimal execution plans and the execution engine is responsible for running them. I know, that’s obvious. But, IMO, this is a very important point. In fact, the separation of concerns makes sure that modifications in one component don’t (necessarily) require modifications in the other. So, not because the implementation of the execution engine in 12.1 has some limitations (not being able to check an inflection point that is “too high”) it means that the implementation of the query optimizer should be limited as well. With the current state of affairs, when the execution engine will be improved, no modification in the query optimizer will be necessary. In fact, the query optimizer already generates an optimal plan (from its point of view).

Having said that let me comment a couple of your opinions based on my limited knowledge (I would love to know more, e.g. being able to read the code, but that’s something that currently isn’t possible…).

> The nonsense in my opinion is that size calculated from column definition can be > calculated at parse time, when the point of inflection is determined.

For computing the inflection point the query optimizer uses object statistics. From what we can see, there is no specific improvement in the cost estimator. And that is a good thing. In fact, including other source of information (e.g. maximum size of a column) requires additional code. And more code means more complexity and therefore more bugs. In software development every time you add complexity you have to ask what the advantages of doing so are. In this case, IMO, none. In fact, no matter how complex the costing model would be, there is no way to know how much data will be returned to the statistics collector. The reason is quite simple: if the row source operation that the statistics collector calls applies a filter (something that’s common), the query optimizer can’t know how large the columns that fulfill the filter are (ok, with dynamic sampling it could be possible to get that information).

In summary, IMO, it’s fine that the query optimizer estimates the inflection point only based on object statistics as it usually do.

> The number of rows to buffer is calculated from the theoretical size of the columns. > Which is very bad in my opinion given the number of applications that declares column > size at their maximum. And I see no reason why this has to be set like that.

The data returned by the function implementing the row source operation called by the statistics collector can’t efficiently use a return by values (too much data). So, it ought to be a return by reference. Even though I never saw the code, IMO what happens is that the caller (the statistics collector) allocates the memory which is used to buffer the data. Then, it calls the row source operation that has to provide the data and requests a certain amount of rows to be written at a specific location in memory. Note that with such an implementation the row source operation that returns data is completely unaware of the statistics collector and, therefore, no modification in its code was necessary to introduce the feature. And that is a huge advantage when you have hundreds of row source operations that would be impacted. But let’s get back to the buffer. How much memory the statistics collector has to allocate *before* calling the rows source operation that returns data? The only safe “guess” is the maximum possible size. And, of course, if too much memory (whatever that means) is required the statistics collector can completely avoid doing the buffering and, therefore, considering the alternate join method. Simply put, it aborts the check of the inflection point and picks the default join method as the final one.

Of course such an implementation can be improved. But, again, what are the pros and cons of doing so? Cons are the same I mentioned before (more code, more complexity, more bugs, …) and, maybe, given the number of changes that would be required, the feature would never be implemented. And, honestly, the only pros (being able to check a higher inflection point) isn’t soooo important. Especially if the code was correctly implemented :-).

In summary, IMO, the current implementation can be improved but it’s not that bad.

Also notice that aborting the adaptive join selection sooner in the process (i.e. during optimization) isn’t saving much overhead. In fact, the inflection point should be computed anyway and, if the statistics collector decides to abort, it can do so with very limited work.

Best, Chris

Reply to Chris
Franck Pachot says:

May 5, 2016 at 19 h 23 min

Hi Chris, Thanks a lot for your feedback which explains very good reasons for the implementation choice. Regards, Franck.

Reply to Franck
jko says:

May 6, 2016 at 10 h 14 min

Hi Franck, We can agree that the IP is calculated during execution time and the allocation of the memory required to store the rows for that calculation is done just before the execution of the chosen plan. Let’s see how complex the calculation can be and why, potentially, the way it is implemented is not that bad. The goal here is to have the most efficient number of rows buffered to calculate the best IP.
```
The execution of the plan start
	Allocate a predefined size for STATISTICS COLLECTOR phase (let’s say 16K)   -->InitBuff
	TotalRowsSize:=0
	Rows:=0
 	Nested loop
		If Enough Space in InitBuff
			Store the rows in InitBuff
			Collecting all columns information (data size)  --> computing row size
			TotalRowsSize += CurrentRowSize
			Rows++
		Else
			Reallocate the InitBuff by adding more memory
			SizeToAdd = (TotalRowsSize / Rows) * EstimatedRowNeeded
		End if;
		Compute the IP
	EndLoop
```
We clearly see that the overhead during execution can be significant. Another point is the value we could use for “EstimatedRowNeeded”. So I agree with Chris that the implementation is not perfect but works quiet well. And thank you Franck for raising this fantastic point. Cheers jko

Reply to jko
Thomas Teske says:

May 6, 2016 at 21 h 31 min

Hello,

two thoughts on the subject: it takes observation in applications to see the inflection happening. My experience is: the plans can change and this is a great capability as such. So the question is: how often do we observe the wrong join method being applied? Under which circumstances does it happen?

Regarding parameters: we have already many of them. I’d only ask for an additional documentation of such parameter, if there is a material impact on an application. Then its really needed. The customer support team will advise in such situations.

Another point of view: parsing is the simplest part here. Cost estimation is not simple (just think of problems involving many row sources – find a plan in a finite yet reasonable amount of processing steps) and finally implementing the access. Join efficiency is part of execution. Just think of the different join efficiency by join-method for row vs. column store.

I’d be seriously interested to see application examples for this one.

Kind regards Thomas

Reply to Thomas
Houri Mohamed says:

May 13, 2016 at 15 h 50 min

Bonjour Franck,

According to my little practical experience on system running 12c (I just started working on a 12c upgrade) the statistics collector is associated with a certain buffer size. This buffer size has multiple reasons like collecting statistics of the buffered data from the portion of the plan that feeds the STATISTICS COLLECTOR. These collected statics can be: number of rows returned, number of distinct values, number of unique values, maximum and minimum value, etc.… This buffer size serves also an important role in adaptive plan which is preventing a subsequent portion of the initial plan to start processing before the query engine is sure that it will definitely opt for the initial plan. I have the tendency to compare this couple (STATISTICS COLLECTOR, buffer size) to the HASH JOIN BUFFERED operation in a parallel run with which Oracle ensure that only 2 DFOs can be active per DFO tree; but in the parallel run there is no limit to the buffering rather than the physical TEMP space.

The third important point which I believe is related with this buffer size is this: it is only when the size of the rows buffered by the STATISTICS COLLECTOR exceeds the size of the buffer size that the SQL engine will start challenging the initial selected plan. I wrote challenging because exceeding the size of buffer size doesn’t mean that the initial plan will be systematically ignored. In other words, it is not because the buffered rows exceed to buffer size that a switch in the join will occur. When this limit is exceeded Oracle will start verifying certain criteria of the initial plan using the above mentioned statistics collected by the STATISTICS COLLECTOR. Depending on this verification Oracle will either switch or decide to go with the initial plan.

That’s how I have understood the situation.

Best regards Mohamed

Reply to Houri

Follow: Linkedin, Twitter, Youtube, Mastodon, dev.to