This was first published on https://blog.dbi-services.com/do-the-block-size-matter (2015-06-23)
Republishing here for new followers. The content is related to the the versions available at the publication date
The default block size is 8k in Oracle. This is the most common. Sometimes, people create the database with 16k block size for datawarehouses. You can also find some 32k block size, but less common which means more bugs. In our tuning workshop we recommend the default, 8k, for most of the cases. I’ll show here 3 tests to show what different block size change for full table scan and index access.
I have defined a cache size for the non default block size I want to use:
SQL> show parameter db%_cache_size NAME TYPE VALUE ------------------------------------ ----------- ----- db_cache_size big integer 0 db_2k_cache_size big integer 0 db_4k_cache_size big integer 0 db_8k_cache_size big integer 0 db_16k_cache_size big integer 112M db_32k_cache_size big integer 112M db_flash_cache_size big integer 0 db_keep_cache_size big integer 0 db_recycle_cache_size big integer 0I’m creating 3 tablespaces with 8k, 16k and 32k block size. I create them as uniform with a 1M extent size because we are supposed to use large block size for large tables and I don’t want the side effects of smaller first extents in auto extent size.
SQL> create tablespace DEMO08K datafile '/oracle/u01/oradata/DEMO08K.dbf' size 1024M extent management local uniform size 1M blocksize 8k; Tablespace created. SQL> create tablespace DEMO16K datafile '/oracle/u01/oradata/DEMO16K.dbf' size 1024M extent management local uniform size 1M blocksize 16k; Tablespace created. SQL> create tablespace DEMO32K datafile '/oracle/u01/oradata/DEMO32K.dbf' size 1024M extent management local uniform size 1M blocksize 32k; Tablespace created.and then create 3 identical tables in each tablespace:
SQL> create table TAB08K (id constraint PK_08K primary key,n,x) tablespace DEMO08K as select rownum id , ora_hash(rownum,10) n , lpad(ora_hash(rownum,100),20,'0') x from xmltable('1 to 10000000'); Table created. SQL> create table TAB16K (id constraint PK_16K primary key,n,x) tablespace DEMO16K as select rownum id , ora_hash(rownum,10) n , lpad(ora_hash(rownum,100),20,'0') x from xmltable('1 to 10000000'); Table created. SQL> create table TAB32K (id constraint PK_32K primary key,n,x) tablespace DEMO32K as select rownum id , ora_hash(rownum,10) n , lpad(ora_hash(rownum,100),20,'0') x from xmltable('1 to 10000000'); Table created.My tables have 10 million rows, two number column and one larger varchar2:
SQL> select table_name,num_rows,avg_row_len,blocks from user_tables where table_name like 'TAB__K'; TABLE_NAME NUM_ROWS AVG_ROW_LEN BLOCKS ---------- ---------- ----------- ---------- TAB08K 10000000 30 48459 TAB16K 10000000 30 23997 TAB32K 10000000 30 11933Of course, larger block size need smaller number of blocks, but the total size is roughly the same. Here I have small rows so this is where the fixed size of block header can make the most difference.
So, the common idea is that larger block size helps to do larger i/o calls when doing full table scan…
SQL> set timing on arraysize 5000 autotrace trace SQL> select * from TAB08K; 10000000 rows selected. Elapsed: 00:00:32.53 Execution Plan ---------------------------------------------------------- Plan hash value: 1209268626 ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 286M| 8462 (1)| | 1 | TABLE ACCESS STORAGE FULL| TAB08K | 10M| 286M| 8462 (1)| ------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 5 recursive calls 0 db block gets 50174 consistent gets 48175 physical reads 0 redo size 348174002 bytes sent via SQL*Net to client 22489 bytes received via SQL*Net from client 2001 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000000 rows processed SQL> set autotrace off SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 403 physical read total multi block requests 379 physical read total bytes 394821632 physical reads 48196 physical reads cache 23 physical reads direct 48173 physical read IO requests 403 physical read bytes 394821632 physical reads direct temporary tablespace 1I’ve read 48175 8k blocks with 403 i/o calls.
Now doing the same from the table stored in the 16k blocksize tablespace:
SQL> select * from TAB16K; 10000000 rows selected. Elapsed: 00:00:31.04 Execution Plan ---------------------------------------------------------- Plan hash value: 2288178481 ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 286M| 4378 (2)| | 1 | TABLE ACCESS STORAGE FULL| TAB16K | 10M| 286M| 4378 (2)| ------------------------------------------------------------------------- SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 397 physical read total multi block requests 375 physical read total bytes 391012352 physical reads 23876 physical reads cache 21 physical reads direct 23855 physical read IO requests 397 physical read bytes 391012352 physical reads direct temporary tablespace 1I’ve read 23855 16k blocks with 397 i/o calls. It’s not a lot better.
SQL> select * from TAB32K; 10000000 rows selected. Elapsed: 00:00:29.61 Execution Plan ---------------------------------------------------------- Plan hash value: 1240330363 ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 286M| 2364 (3)| | 1 | TABLE ACCESS STORAGE FULL| TAB32K | 10M| 286M| 2364 (3)| ------------------------------------------------------------------------- SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 398 physical read total multi block requests 373 physical read total bytes 388890624 physical reads 11886 physical reads cache 24 physical reads direct 11862 physical read IO requests 398 physical read bytes 388890624 physical reads direct temporary tablespace 1I’ve read 11892 32k blocks with 398 i/o calls.
Conclusion: we do roughly the same amount of i/o when doing a full table scan. This is because Oracle is reading in multiblock. Note that the db_file_multiblock_read_count is defined as the number of blocks of default block size, but the i/o size is adapted for tablespace with non default block size. For example, db_file_multiblock_read_count=128 when default block size is 8k will always read 1MB.
I already have an index on the primary key. Let’s add some more indexes:
SQL> create index ID_08K on TAB08K(x) tablespace DEMO08K ; Index created. SQL> create index ID_16K on TAB16K(x) tablespace DEMO16K ; Index created. SQL> create index ID_32K on TAB32K(x) tablespace DEMO32K ; Index created. SQL> create bitmap index BI_08K on TAB08K(n) tablespace DEMO08K ; Index created. SQL> create bitmap index BI_16K on TAB16K(n) tablespace DEMO16K ; Index created. SQL> create bitmap index BI_32K on TAB32K(n) tablespace DEMO32K ; Index created.and check their size:
SQL> select index_name,num_rows,blevel,leaf_blocks from user_indexes where table_name like 'TAB__K' order by 1; INDEX_NAME NUM_ROWS BLEVEL LEAF_BLOCKS ---------- ---------- ---------- ----------- BI_08K 3211 2 1606 BI_16K 1562 1 781 BI_32K 759 1 380 ID_08K 10000000 2 44643 ID_16K 10000000 2 22027 ID_32K 10000000 2 10929 PK_08K 10000000 2 22132 PK_16K 10000000 2 10921 PK_32K 10000000 2 5425Of course the number of blocks is lower when the block size is bigger. And because branches are larger, then you may have a smaller depth. But look: on my 10000000 rows table the depth is the same for the regular indexes: 2 branch levels. Only for the bitmap indexes, because they are very small, we need one less branch level here.
But think about it. Index depth mostly matter for OLTP where you get rows by their primary key. But people say that smaller blocks are better for OLTP… Datawarehouses often have bitmap indexes, but do you care to have smaller bitmap indexes?
Anyway, let’s test a large range scan:
SQL> select * from TAB08K where id between 1 and 100000; 100000 rows selected. Elapsed: 00:00:00.44 Execution Plan ---------------------------------------------------------- Plan hash value: 2790916815 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2929K| 707 (1)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB08K | 100K| 2929K| 707 (1)| |* 2 | INDEX RANGE SCAN | PK_08K | 100K| | 225 (1)| ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID" v=100000) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 878 consistent gets 679 physical reads 0 redo size 3389860 bytes sent via SQL*Net to client 1589 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed SQL> set autotrace off SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 705 physical read total bytes 5775360 physical reads 705 physical reads cache 705 physical read IO requests 705 physical read bytes 5775360We have read 100000 rows through index. The index is very well clustered. I’ve done 705 i/o calls to get those rows from 8k blocks.
Now with 16k blocks:
SQL> select * from TAB16K where id between 1 and 100000; 100000 rows selected. Elapsed: 00:00:00.37 Execution Plan ---------------------------------------------------------- Plan hash value: 1432239150 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2929K| 352 (1)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB16K | 100K| 2929K| 352 (1)| |* 2 | INDEX RANGE SCAN | PK_16K | 100K| | 113 (1)| ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"v=100000) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 537 consistent gets 337 physical reads 0 redo size 3389860 bytes sent via SQL*Net to client 1589 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed SQL> set autotrace off SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 363 physical read total bytes 5734400 physical reads 363 physical reads cache 363 physical read IO requests 363 physical read bytes 5734400the number of i/o calls have been divided by two.
SQL> select * from TAB32K where id between 1 and 100000; 100000 rows selected. Elapsed: 00:00:00.35 Execution Plan ---------------------------------------------------------- Plan hash value: 3074346038 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2929K| 177 (1)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB32K | 100K| 2929K| 177 (1)| |* 2 | INDEX RANGE SCAN | PK_32K | 100K| | 58 (2)| ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"v=100000) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 369 consistent gets 169 physical reads 0 redo size 3389860 bytes sent via SQL*Net to client 1589 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed SQL> set autotrace off SQL> select name,value from v$mystat join v$statname using(statistic#) where name like 'physical read%' and value>0; NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 195 physical read total bytes 5750784 physical reads 195 physical reads cache 195 physical read IO requests 195 physical read bytes 5750784with 32k blocks, it’s once again divided by two.
Conclusion: when doing single block reads, coming from a well clustered index, we do less i/o calls with larger blocks. The fact is that because we need contiguous rows (because we are using a well clustered index) having large blocks makes more rows physically contiguous.
Here is a query WHERE X=’00000000000000000000′. The index on N – that I’ve populated with a hash value on rownum – has a bad clustering factor. I fetch only 30 rows.
---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 30 | 900 | 16 (0)| | 1 | COUNT STOPKEY | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| TAB08K | 30 | 900 | 16 (0)| |* 3 | INDEX RANGE SCAN | ID_08K | 99010 | | 3 (0)| ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("X"='00000000000000000000') NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 16 physical read total bytes 131072 physical reads 16 physical reads cache 16 physical read IO requests 16 physical read bytes 131072The query returned about 30 rows. Because of the bad clustering factor we had to read a block every two rows on average. Let’s see the same with 16k blocks.
--------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 30 | 900 | 7 (0)| 00:00:01 | | 1 | COUNT STOPKEY | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID| TAB32K | 30 | 900 | 7 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | ID_32K | 99010 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("X"='00000000000000000000') NAME VALUE ---------------------------------------------------------------- ---------- physical read total IO requests 33 physical read total bytes 442368 physical reads 33 physical reads cache 33 physical read IO requests 33 physical read bytes 442368More i/o calls here and higher block size. Conclusion: larger block size is bad when we need only few rows, especially from a badly clustered index. More i/o calls, larger i/o size, and the large blocks takes more space in the buffer cache.
Question is: do you get better performance in datawarhouse with larger block size? For full table scans, no it’s the same and has always been the same. For index range scans retrieving large number of rows, then yes, the access by index may be faster. But think about it. Retrieving lot of rows by index is always bad, – whatever the block size is. If you want do to something about it, look at the design: define better indexes (covering all selected columns), partition the table, compress the table, use Parallel Query,… Lot of tuning to do before thinking about block size.
With smaller block size the optimizer will favor a full table scan for those cases, and today the gap between full table scan and index access is not so large. There is Exadata SmartScan, In-Memory. Even without options, multiblock read is faster since 11g when doing serial direct read. All that makes full table scan faster. And index access has also been improved: you have the batched access by rowid to compensate bad clustring factor.
And if you think about having a non default block size for some tablespaces, are you sure that the gain you expect will not be ruined by a bad sizing of buffer cache? When having non default blocksize tablespace you have to manage their buffer cache manually.
Larger block size may have been recommended for very large tablespaces in order to avoid to have too many datafiles (their maximum size is in number of block) but today you can create bigfile tablespaces for them, so it is not a reason anymore.
There is one reason to have larger block size. When you have very large rows, you can avoid row chaining. But once again, are you sure it is a problem (i.e do you select often the columns at the end)? And maybe you should review the design first.
There was another reason to have a larger block size for tablespace containing large LOB (I know that the ‘L’ is already for ‘Large’ but I mean LOBs larger than the default block size). Today you should use SecureFiles and we get better performance with them. But that’s for another blog post.
When people come with those kinds of rules of thumbs, I usually try to see if it is something they thought about, or just a blind idea. For example, when they want to rebuild indexes, I ask them which PCTFREE they set for that. Because rebuilding without knowing the PCTFREE we want to achieve is pointless. And when they want to create a tablespace with non default block size, I ask them how they have calculated the buffer cache to allocate for that blocksize. The whole size of the table? Then why do you want to do less i/o calls if everything is in cache. A small size? Then are you sure that the lower number of i/o calls will compensate the cache you can have in the default buffer pool?
In most of the cases, just keep the default block size, and you have probably lot of other things to tune. I’ll now tell you a short story. I was a junior not-yet-DBA in 1998, implementing a banking software (COBOL, tuxedo, HP-UX, Oracle 7). Application design had a lot to review. Things like comments stored in CHAR(2000) for example. Then I’ve seen a senior consultant recommending ‘The’ solution: increase the block size. The customer accepted that, we did it and everything was immediately better. Of course, in order to do that you have to export/import the database, everything was reorganized, lot of design problems were hidden for a few days. After a while, the performance issues came back, and we had to continue the optimization tasks. Being on a larger block size did not change anything about that. This is where I learned exactly which kind of consultant I don’t want to be.
Thank you for a nice post reviewing some of the impacts (or not!) of having tablespaces with a larger block size than the default. There is another aspect of this which is often overlooked. Any row read into the database buffer cache (DBC) is actually a block. If you have any OLTP-type access to these tables (ie you generally just want the one row in that block) or small range scans where the clustering factor is high (so again, you only want one or two of the records in any given table block fetched into the DBC), with the larger block size you are “wasting” more memory. The section of the SGA you have allocated to that non-standard block size could well be less efficient than using the standard block size. That is also where physical clustering of data can reap large benefits. With a data warehouse you are probably aiming more for full segment scans than single row lookup or small range scans – and you have shown that the block size makes little or no difference to this. So use of non-standard, larger block sizes is more likely to be detrimental to overall system performance :-). It is also worth mentioning that mixed block sizes, at least in older versions of Oracle (10 and 11 – I think) can cause some CBO oddities too.
Franck, I love your blog and always enjoy science shooting down myths like block size magic. That said,I’d like to point out that mention of Smart Scan in this post begs the explanation that Smart Scan is technology that is rather outside any discussion of block size because it is rather ignorant of blocks (except HCC CUs). It operates on ASM units and returns tuples to the PGA of foreground processes.
Again I want to say, once again, this is a great post.
I’d like to add 2 sentences from the documentation regarding “Multiple Blocksizes”: Oracle9i Database Concepts – Release 2 (9.2) – Part Number A96524-01 http://docs.oracle.com/cd/B10501_01/server.920/a96524/c04space.htm#11420
Multiple block sizes are useful primarily when transporting a tablespace from an OLTP database to an enterprise data warehouse. This facilitates transport between databases of different block sizes.
Martin W, Thanks for your comment. Yes, the cost-per-row for ‘table access by rowid’ is higher with large block size and it’s good to realize that it’s not only disk reads but memory waste (which in turn lead to more disk reads for other queries). Even in datawarehouse, there are those ‘table access by rowid’ on fact table coming from bitmap index combination.
Martin B, Yes, thanks, always good to remember what is the aim of a feature is. Multiple block size is not there to address performance problems.
Kevin, Thanks a lot for your feedback. Yes, I named SmartScan and In-Memory to say that the trend is to lower the cost of full table scan even for small subset of data (however this concerns only very small subset of Oracle customers…) Good to mention that Exadata SmartScan has nothing to do with blocks and buffers. Several limitations come from that: result cannot be shared (so direct path read only – means checkpoint before), it cannot apply undo to get consistent buffer, not optimal to get only few first_rows,etc. OLTP workloads need to read blocks, shared in buffer cache…
Kind Regards, Franck.
Hi Franck, now i may add some geeky stuff as statement “For example, db_file_multiblock_read_count=128 when default block size is 8k will always read 1MB” is not entirely correct in case of real I/O ;-))
The I/O request of 1 MB (to OS/SCI) is usually scattered into “max_sectors_kb” pieces. Frits did some great analysis about this as usual: https://fritshoogland.wordpress.com/2014/11/28/physical-io-on-linux/
Regards Stefan
Two considerations : Row Chaining (long rows with many columns) Compression
Don’t bigger blocksizes compress better?
Hi Stephan, Thanks for the link to Frits post. Good to know.
Hi Hemant and anonymous, Yes, basic compression maw be better on larger block size. Which depend a lot on the data you have. Fortunately, the dbms_compression.get_compression_ratio has a ‘scratchtbsname’ parameter so you can do that compression estimation on a tablespace with different block size.
Regards, Franck.
Hi Franck, great article – as usual. Since adding links is one of my favourite hobbies: Richard Foote on large index block sizes: https://richardfoote.wordpress.com/2008/03/18/store-indexes-in-larger-block-size-the-multiblock-read-myth-karma-police/. I think Jonathan Lewis advocated the use of the 8K standard more than once – because it’s better tested than anything else – but fail to provide a link in this case.
Regards Martin (No. 3 in this comment section)
Good to see a decent example of what people often discuss and never try/prove in a scientific way. Thank you for sharing.
Hi Franck,
You say : “db_file_multiblock_read_count is defined as the number of blocks of default block size, but the i/o size is adapted for tablespace with non default block size. For example, db_file_multiblock_read_count=128 when default block size is 8k will always read 1MB.”.
Do you mean that for a tablespace using a block size of 32K, the db_file_multiblock_read_count will be “decreased” from 128 to 32 to keep the 1MB per I/O ?
Hi Yann, Yes this is what I mean. db_file_multiblock_read_count=128 on a 8k default block size will read at maximum 32 blocks at a time from a 32k block size tablespace. Regards, Franck
I haven’t tested this (other block size than 8k), but from my research on 8k blocks, I got the impression that buffered multi block io (scattered read) obeys the 1mb limit, but direct read multi block io (direct path read) doesn’t. I was able to submit a 32 mb io using direct path.
Hi All very nice discussion on non standard block sizes. So pretty much the discussion is out-ruling this feature for scattered reads and advising only for index blocks if I understood correctly. One question still I have is would it help any inserts where the batches try inserting like tens of millions of rows during a batch job load? Would using non standard tablespace for such tables with increased init trans and parallel option help?
Hi Muthu, that can be interesting to test, but it depends on many parameters: parallel inserts? increasing or scattered values? Large indexes columns?
Hi Franck,
Thanks for a great article. So the take away is that a bigger block size brings in larger number of index entries which could be helpful for certain type of applications. But I think the database block size issue is more related to file system block size and device block size. Alignment of various block sizes could give you tremendous performance gain that cannot be ignored.
Hi, yes block size must be aligned. But for that 8k is usually ok. And to do larger I/O you don’t need bigger block size because you have multiblock read.
Hi Franck,
I was your block,its very helpfull.I have some queries i am running the oracle query and interting data into oracle table (source and target are in oracle) processing around 20 millions rows(45 columns) to insert into target table its taking 9 hours.My hardware configuration is Linux 64 bit machine 2 CPU ,32 GB RAM.
I wanted to check with you is the hardware is good to process this volume of data.