This was first published on https://blog.dbi-services.com/adaptive-plan-how-much-can-statistics-collector-buffer (2016-05-04)
Republishing here for new followers. The content is related to the the versions available at the publication date
The 12c adaptive plan prepares two join methods (Hash Join and Nested Loop), actives the one that has the better cost for the estimated cardinality and computes the point of inflection in cardinality estimation where the best cost changes to the other join method. At execution time, rows are buffered by a STATISTICS COLLECTOR operation in order to see if the point of inflection is reached. If it doesn’t, the plan continues as planned. If it does, the alternative join method is activated. But buffering has a limit…
@ChrisAntognini @FranckPachot True, Chris. Do you know the internal algorithm or an underscore parameter that controls the buffering limit?
— Stefan Koehler (@OracleSK) May 3, 2016
Let’s try to find this limit empirically.
I create a table with enough rows:
SQL> create table demo1 (n constraint demo1pk primary key,x1) as select 0 , cast('x' as varchar2(4000)) from dual; Table created. SQL> insert --+ append into demo1 select 1e7+rownum ,'x' from xmltable('1 to 200000'); 200000 rows created.and a second table to join:
SQL> create table demo2 (n constraint demo2pk primary key,x2) as select 0 , 'x' from dual; Table created.I filled the DEMO1 table in two steps. First, CTAS with one row so that the statistics (online statistics gathering) favors nested loops. And I inserted lot of rows later because I want to fill the Adaptive Plan buffer. DEMO2 is a small table but I want the FULL TABLE SCAN on it to be a bit more expensive or hash join will be always choosen. I do that by faking the number of blocks:
SQL> exec dbms_stats.set_table_stats(user,'DEMO2',numblks=3000,no_invalidate=>false); PL/SQL procedure successfully completed.If I check the execution plan I see that NESTED LOOP is chosen because estimated number of rows is small (artificially set to 1 row):
SQL> explain plan for 2 select max(x1),max(x2) from demo1 join demo2 using(n); Explained. SQL> select * from table(dbms_xplan.display(format=>'+adaptive')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 604214593 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 8 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 8 | | | |- * 2 | HASH JOIN | | 1 | 8 | 4 (0)| 00:00:01 | | 3 | NESTED LOOPS | | 1 | 8 | 4 (0)| 00:00:01 | | 4 | NESTED LOOPS | | 1 | 8 | 4 (0)| 00:00:01 | |- 5 | STATISTICS COLLECTOR | | | | | | | 6 | TABLE ACCESS FULL | DEMO1 | 1 | 4 | 3 (0)| 00:00:01 | | * 7 | INDEX UNIQUE SCAN | DEMO2PK | 1 | | 0 (0)| 00:00:01 | | 8 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 1 | 4 | 1 (0)| 00:00:01 | |- 9 | TABLE ACCESS FULL | DEMO2 | 1 | 4 | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DEMO1"."N"="DEMO2"."N") 7 - access("DEMO1"."N"="DEMO2"."N") Note ----- - this is an adaptive plan (rows marked '-' are inactive)But the plan is adaptive and can switch to HASH JOIN of more rows than expected are encountered by STATISTICS COLLECTOR.
I run it and gather run time statistics
SQL> alter session set statistics_level=all; Session altered. SQL> select max(x1),max(x2) from demo1 join demo2 using(n); MAX(X1) ---------------------------------------------------------------------------------------------------------------------------------------------------------------- M - x x
And here is the adaptive plan: Hash Join is activated because we have actually lot of rows (200000):
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID d2y436sr1cx3r, child number 0 ------------------------------------- select max(x1),max(x2) from demo1 join demo2 using(n) Plan hash value: 740165205 ---------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| ---------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:02.04 | 372 | 423 | 483 | | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.04 | 372 | 423 | 483 | | | | | |* 2 | HASH JOIN | | 1 | 1 | 1 |00:00:02.04 | 372 | 423 | 483 | 11M| 4521K| 1262K (1)| 4096 | | 3 | TABLE ACCESS FULL| DEMO1 | 1 | 1 | 200K|00:00:00.20 | 369 | 360 | 0 | | | | | | 4 | TABLE ACCESS FULL| DEMO2 | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | | | ---------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("DEMO1"."N"="DEMO2"."N") 7 - access("DEMO1"."N"="DEMO2"."N") Note ----- - this is an adaptive plan
The point of inflection is 814:
SQL> column tracefile new_value tracefile SQL> alter session set tracefile_identifier='cbo_trace'; Session altered. SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid')); TRACEFILE ---------------------------------------------------------------------------------------------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ora_3979_cbo_trace.trc SQL> host > &tracefile. SQL> exec dbms_sqldiag.dump_trace(p_sql_id=>'d2y436sr1cx3r',p_child_number=>0,p_component=>'Compiler',p_file_id=>''); PL/SQL procedure successfully completed. SQL> host grep -E "^DP" &tracefile. | tail DP - distinct placement DP: Found point of inflection for NLJ vs. HJ: card = 814.00
So here, 814 rows were buffered and the plan switched to HASH JOIN. I want to know how many rows can be buffered, so I want to increase the point of inflection. Easy, if the cost of DEMO2 full table scan is higher then the NESTED LOOP will be cheaper than HASH JOIN even with more rows. Let’s fake the DEMO2 statistics to show a larger table:
SQL> exec dbms_stats.set_table_stats(user,'DEMO2',numblks=4000,no_invalidate=>false); PL/SQL procedure successfully completed.
And let’s run that again:
SQL> select max(x1),max(x2) from demo1 join demo2 using(n); MAX(X1) ---------------------------------------------------------------------------------------------------------------------------------------------------------------- M - x x SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID d2y436sr1cx3r, child number 0 ------------------------------------- select max(x1),max(x2) from demo1 join demo2 using(n) Plan hash value: 604214593 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:01.57 | 374 | 360 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.57 | 374 | 360 | | 2 | NESTED LOOPS | | 1 | 1 | 1 |00:00:01.57 | 374 | 360 | | 3 | NESTED LOOPS | | 1 | 1 | 1 |00:00:01.57 | 373 | 360 | | 4 | TABLE ACCESS FULL | DEMO1 | 1 | 1 | 200K|00:00:00.20 | 369 | 360 | |* 5 | INDEX UNIQUE SCAN | DEMO2PK | 200K| 1 | 1 |00:00:00.42 | 4 | 0 | | 6 | TABLE ACCESS BY INDEX ROWID| DEMO2 | 1 | 1 | 1 |00:00:00.01 | 1 | 0 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("DEMO1"."N"="DEMO2"."N") Note ----- - this is an adaptive plan 27 rows selected. SQL> column tracefile new_value tracefile SQL> alter session set tracefile_identifier='cbo_trace'; Session altered. SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid')); TRACEFILE ---------------------------------------------------------------------------------------------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/cdb/CDB/trace/CDB_ora_4083_cbo_trace.trc SQL> host > &tracefile. SQL> exec dbms_sqldiag.dump_trace(p_sql_id=>'d2y436sr1cx3r',p_child_number=>0,p_component=>'Compiler',p_file_id=>''); PL/SQL procedure successfully completed. SQL> host grep -E "^DP" &tracefile. | tail DP - distinct placement DP: Found point of inflection for NLJ vs. HJ: card = 1086.00
Read it from the end:
By Dichotomy, I’ve scripted similar tests to find the point where reaching the point of inflection do not trigger a plan switch. ‘JOIN’ is the method chosen (from dbms_xplan.display_cursor after execution), ‘INFLECTION POINT’ is the one gathered from 10053 trace and ‘STATBLKS’ is the numblks I set for DEMO2 in order to vary the point of inflection.
JOIN INFLECTION POINT HASH_AREA_SIZE BUFFER STATBLKS LPAD NESTED 271889 1000000 2175117 1000000 1 NESTED 135823 1000000 1086590 500000 1 NESTED 67789 1000000 542319 250000 1 NESTED 33885 1000000 271087 125000 1 NESTED 16943 1000000 135551 62500 1 NESTED 8471 1000000 67775 31250 1 NESTED 4238 1000000 33904 15625 1 NESTED 2120 1000000 16960 7813 1 NESTED 1060 1000000 8480 3907 1 HASH 532 1000000 4256 1954 1 HASH 796 1000000 6368 2930 1 HASH 928 1000000 7424 3418 1 HASH 994 1000000 7952 3662 1 HASH 1026 1000000 8208 3784 1 NESTED 1044 1000000 8352 3845 1 HASH 1036 1000000 8288 3814 1 HASH 1040 1000000 8320 3829 1 NESTED 1042 1000000 8336 3837 1 HASH 1040 1000000 8320 3833 1 HASH 1040 1000000 8320 3835 1 NESTED 1042 1000000 8336 3836 1I’ve added some variations on hash_area_size (my bad guess that it makes sense to buffer up to that amount because this is what will go to hash area size at least, if hash join is finally chosen) and on the DEMO1 row size (by varying an lpad on column X). For the moment, when point of inflection is less than 1041 a plan switch occurs and when it is higher than 1042 no plan switch occurs.
But there are probably other parameters influencing because:
@FranckPachot @OracleSK I just modified/run an example I have and I can confirm to see a switch with an inflection point at 21216.
— Christian Antognini (@ChrisAntognini) May 4, 2016
Any idea welcome…
Thanks to Chris Antognini. It appears the parameter that influences the number of rows buffered is not the actual size of the row. The number of rows to buffer is calculated from the theoretical size of the columns. Which is very bad in my opinion given the number of applications that declares column size at their maximum. And I see no reason why this has to be set like that. Rows are of variable size and allocating buffers on column definition is not a good idea. That reminds jdbc fetch size very well described by Sigrid Keydana.
Here is the limit for different size of the varchar2:
JOIN INFLECTION POINT HASH_AREA_SIZE BUFFER SIZE STATBLKS VARCHAR SIZE NESTED 155345 65536 3572954 572641 1 NESTED 74899 65536 3894796 276092 30 NESTED 63551 65536 3940220 234258 40 NESTED 33289 65536 4061376 122802 100 NESTED 12865 65536 4142848 47453 300 NESTED 7973 65536 4162422 29409 500 NESTED 4090 65536 4179980 15079 1000 NESTED 2072 65536 4189584 7635 2000 NESTED 1388 65536 4194536 5110 3000
So it looks like oracle allocates a buffer of about few MB, calculates how many rows can fit there given their column definition, and limits the buffering to that number of rows. The nonsense in my opinion is that size calculated from column definition can be calculated at parse time, when the point of inflection is determined. It makes no sense to set a point of inflection higher than the number of rows that can be buffered.
Salut Franck
Try replace the VARCHAR2(4000) with something much smaller…
HTH Chris
Just a question that brings me to a quick guess: where the buffer is stored? I mean it’s in a memory pool somewhere so my guess is that buffer size may depends on stuff like NUM_CPU or the size of another memory pool. If you test on a small machine you probably get a smaller buffer and less rows fit in the buffer.
Salut Franck
This comment refers to the “Update 30 mins later” part. I wrote it because, I must admit, I don’t agree with some of your opinions. And, first of all, apologize if I’m prolix. But in that way I hope that the readers of your blog get a different point of view when they read this text.
Let me start by making clear that the query optimizer and the execution engine are two separate components. Simply put, the query optimizer is responsible for generating optimal execution plans and the execution engine is responsible for running them. I know, that’s obvious. But, IMO, this is a very important point. In fact, the separation of concerns makes sure that modifications in one component don’t (necessarily) require modifications in the other. So, not because the implementation of the execution engine in 12.1 has some limitations (not being able to check an inflection point that is “too high”) it means that the implementation of the query optimizer should be limited as well. With the current state of affairs, when the execution engine will be improved, no modification in the query optimizer will be necessary. In fact, the query optimizer already generates an optimal plan (from its point of view).
Having said that let me comment a couple of your opinions based on my limited knowledge (I would love to know more, e.g. being able to read the code, but that’s something that currently isn’t possible…).
> The nonsense in my opinion is that size calculated from column definition can be > calculated at parse time, when the point of inflection is determined.
For computing the inflection point the query optimizer uses object statistics. From what we can see, there is no specific improvement in the cost estimator. And that is a good thing. In fact, including other source of information (e.g. maximum size of a column) requires additional code. And more code means more complexity and therefore more bugs. In software development every time you add complexity you have to ask what the advantages of doing so are. In this case, IMO, none. In fact, no matter how complex the costing model would be, there is no way to know how much data will be returned to the statistics collector. The reason is quite simple: if the row source operation that the statistics collector calls applies a filter (something that’s common), the query optimizer can’t know how large the columns that fulfill the filter are (ok, with dynamic sampling it could be possible to get that information).
In summary, IMO, it’s fine that the query optimizer estimates the inflection point only based on object statistics as it usually do.
> The number of rows to buffer is calculated from the theoretical size of the columns. > Which is very bad in my opinion given the number of applications that declares column > size at their maximum. And I see no reason why this has to be set like that.
The data returned by the function implementing the row source operation called by the statistics collector can’t efficiently use a return by values (too much data). So, it ought to be a return by reference. Even though I never saw the code, IMO what happens is that the caller (the statistics collector) allocates the memory which is used to buffer the data. Then, it calls the row source operation that has to provide the data and requests a certain amount of rows to be written at a specific location in memory. Note that with such an implementation the row source operation that returns data is completely unaware of the statistics collector and, therefore, no modification in its code was necessary to introduce the feature. And that is a huge advantage when you have hundreds of row source operations that would be impacted. But let’s get back to the buffer. How much memory the statistics collector has to allocate *before* calling the rows source operation that returns data? The only safe “guess” is the maximum possible size. And, of course, if too much memory (whatever that means) is required the statistics collector can completely avoid doing the buffering and, therefore, considering the alternate join method. Simply put, it aborts the check of the inflection point and picks the default join method as the final one.
Of course such an implementation can be improved. But, again, what are the pros and cons of doing so? Cons are the same I mentioned before (more code, more complexity, more bugs, …) and, maybe, given the number of changes that would be required, the feature would never be implemented. And, honestly, the only pros (being able to check a higher inflection point) isn’t soooo important. Especially if the code was correctly implemented :-).
In summary, IMO, the current implementation can be improved but it’s not that bad.
Also notice that aborting the adaptive join selection sooner in the process (i.e. during optimization) isn’t saving much overhead. In fact, the inflection point should be computed anyway and, if the statistics collector decides to abort, it can do so with very limited work.
Best, Chris
Hi Chris, Thanks a lot for your feedback which explains very good reasons for the implementation choice. Regards, Franck.
Hi Franck, We can agree that the IP is calculated during execution time and the allocation of the memory required to store the rows for that calculation is done just before the execution of the chosen plan. Let’s see how complex the calculation can be and why, potentially, the way it is implemented is not that bad. The goal here is to have the most efficient number of rows buffered to calculate the best IP.
We clearly see that the overhead during execution can be significant. Another point is the value we could use for “EstimatedRowNeeded”. So I agree with Chris that the implementation is not perfect but works quiet well. And thank you Franck for raising this fantastic point. Cheers jkoHello,
two thoughts on the subject: it takes observation in applications to see the inflection happening. My experience is: the plans can change and this is a great capability as such. So the question is: how often do we observe the wrong join method being applied? Under which circumstances does it happen?
Regarding parameters: we have already many of them. I’d only ask for an additional documentation of such parameter, if there is a material impact on an application. Then its really needed. The customer support team will advise in such situations.
Another point of view: parsing is the simplest part here. Cost estimation is not simple (just think of problems involving many row sources – find a plan in a finite yet reasonable amount of processing steps) and finally implementing the access. Join efficiency is part of execution. Just think of the different join efficiency by join-method for row vs. column store.
I’d be seriously interested to see application examples for this one.
Kind regards Thomas
Bonjour Franck,
According to my little practical experience on system running 12c (I just started working on a 12c upgrade) the statistics collector is associated with a certain buffer size. This buffer size has multiple reasons like collecting statistics of the buffered data from the portion of the plan that feeds the STATISTICS COLLECTOR. These collected statics can be: number of rows returned, number of distinct values, number of unique values, maximum and minimum value, etc.… This buffer size serves also an important role in adaptive plan which is preventing a subsequent portion of the initial plan to start processing before the query engine is sure that it will definitely opt for the initial plan. I have the tendency to compare this couple (STATISTICS COLLECTOR, buffer size) to the HASH JOIN BUFFERED operation in a parallel run with which Oracle ensure that only 2 DFOs can be active per DFO tree; but in the parallel run there is no limit to the buffering rather than the physical TEMP space.
The third important point which I believe is related with this buffer size is this: it is only when the size of the rows buffered by the STATISTICS COLLECTOR exceeds the size of the buffer size that the SQL engine will start challenging the initial selected plan. I wrote challenging because exceeding the size of buffer size doesn’t mean that the initial plan will be systematically ignored. In other words, it is not because the buffered rows exceed to buffer size that a switch in the join will occur. When this limit is exceeded Oracle will start verifying certain criteria of the initial plan using the above mentioned statistics collected by the STATISTICS COLLECTOR. Depending on this verification Oracle will either switch or decide to go with the initial plan.
That’s how I have understood the situation.
Best regards Mohamed