This was first published on https://blog.dbi-services.com/mapping-in-memory-cu-to-values-imcu-pruning (2015-10-13)
Republishing here for new followers. The content is related to the the versions available at the publication date
In the previous post we have seen how In-Memory Compression Unit map to the physical persistent storage – table extents. Let’s see now how they are mapped to column values, allowing to optimize the scan in a similar way as Exadata Storage Indexes. The examples below will illustrate IMCU pruning and IMCU skip filtering. Here are the slides I use for another example:
Here are the 4 IMCUs I have from the previous post.
SQL> select objd,tsn,startdba,extent_cnt,block_cnt,load_scn,itl_cnt,total_rows,invalid_rows from V$IM_SMU_HEAD; OBJD TSN STARTDBA EXTENT_CNT BLOCK_CNT LOAD_SCN ITL_CNT TOTAL_ROWS INVALID_ROWS ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------- ------------ 107958 6 1572875 22 5 941292582 255 491079 0 107958 6 1573890 8 126 941292582 255 561354 0 107958 6 1574914 8 126 941292582 255 524160 0 107958 6 1575938 7 126 941292582 255 423407 0The important values to remember are the number of rows in each IMCUs. In total I have 2 million rows.
The table was created with:
create table DEMO inmemory tablespace USERS as select rownum num,mod(rownum,10) ten from xmltable('1 to 2000000');I have two columns in my table:
Each IMCU stores the minimum and maximum value for each column in each IMCU and we can see them from V$IM_COL_CU.
The min and max are stored in row format in the same way as in the column statistics. When we know the datatype, we can use the dbms_stats.convert_raw_value to display them. It’s a procedure, not a function, and in 12c I like to create an inline function to use it easily:
SQL> break on head_piece_address skip 1 duplicates SQL> with function row2num(x raw) return number as n number; begin dbms_stats.convert_raw_value(x,n); return n; end; 2 select head_piece_address,column_number,length/1024/1024,dictionary_entries,row2num(minimum_value),row2num(maximum_value) from V$IM_COL_CU order by 1,2,3,4 3 / HEAD_PIECE_ADDRE COLUMN_NUMBER LENGTH/1024/1024 DICTIONARY_ENTRIES ROW2NUM(MINIMUM_VALUE) ROW2NUM(MAXIMUM_VALUE) ---------------- ------------- ---------------- ------------------ ---------------------- ---------------------- 0000000067FFFDD8 1 4.84460545 491079 1 491079 0000000067FFFDD8 2 .234272003 10 0 9 00000000686FFDD8 1 5.66581059 561354 491080 1052433 00000000686FFDD8 2 .267781258 10 0 9 00000000626FFDA8 1 5.6811142 524160 1052434 1576593 00000000626FFDA8 2 .250045776 10 0 9 00000000620FFDA8 1 4.5891037 423407 1576594 2000000 00000000620FFDA8 2 .202003479 10 0 9
For the first column NUM:
And the second column TEN:
First, I check the IM statistics for my session:
SQL> select name,value from v$mystat join v$statname using(statistic#) where ( name like 'IM%row%' or name like 'IM%block%' ) and value>0; NAME VALUE ---------------------- --------- IM scan rows 2000000 IM scan rows valid 2000000 IM scan rows projected 2000000This what I’ve read previously when populating the IMCS: 2 million rows. The following will be done in the same session, so statistics cumulates, but I’ll add the delta values in front.
Now I’m running a query that has to scan all rows
SQL> select distinct ten from DEMO ; ... 10 rows selected.
I’ve read all 2 million rows and have find 10 distinct values from the TEN column.
Now checking my session statistics again:
SQL> select name,value from v$mystat join v$statname using(statistic#) where ( name like 'IM%row%' or name like 'IM%block%' ) and value>0; NAME VALUE ---------------------- ---------- IM scan rows 4000000 +2000000 IM scan rows valid 4000000 +2000000 IM scan rows projected 2000040 +40Among the 2 million rows I had to actually read the 2 million rows because I’ve no filter here. The result of the projection is 10 rows (my ‘distinct’ results) but the projection has been made for each IMCU and this is why we see 40 here.
Let’s run a query that filters only 100000 rows that are within the second IMCU – the one having values from 491080 to 1052433 for column 1 (see above).
SQL> select distinct ten from DEMO where num between 500000 and 600000; ... 10 rows selected.I get different figures now:
SQL> select name,value from v$mystat join v$statname using(statistic#) where ( name like 'IM%row%' or name like 'IM%block%' ) and value>0; NAME VALUE ---------------------- ---------- IM scan rows 6000000 +2000000 IM scan rows valid 4561354 +561354 IM scan rows optimized 1438646 +1438646 IM scan rows projected 2000050 +10 IM scan CUs pruned 3 +3 IM scan segments minmax 4 +4The 2 millions rows are split among ‘valid’ and ‘optimized’. Thanks to the minimum and maximum values stored for each IMCU, pruning has occurred. Only the second IMCU has been scanned and filtered because it’s the only one that can contain values from 500000 to 600000. It contains 561354 rows and this is what is counted as ‘valid’ rows where filtering has to be done. Pruning has saved the processing of the 1438646 rows that are in the other IMCU because we are sure they do not verify the predicate. If you look at the projected rows, there are only 10 because it’s the number of distinct values coming from the only one IMCU processed.
The ‘minmax’ optimization has been used on the 4 IMCUs and 3 of them have been ‘pruned’. This is the ‘no mail’ flag on the mailbox icon in my slides: you know that there is no mail without having to open the mail box.
Now running the query with a larger range that spans three IMCUs.
SQL> select distinct ten from DEMO where num between 500000 and 2000000; ... 10 rows selected.From what we have seen above, only the 1st IMCUs can be pruned now:
SQL> select name,value from v$mystat join v$statname using(statistic#) where ( name like 'IM%row%' or name like 'IM%block%' ) and value>0; NAME VALUE ---------------------- ---------- IM scan rows 8000000 +2000000 IM scan rows valid 6070275 +1508921 IM scan rows optimized 1929725 +491079 IM scan rows projected 2000080 +30 IM scan CUs optimized read 2 +2 IM scan CUs pruned 4 +1 IM scan segments minmax 8 +4We have scanned 3 IMCU here, totalizing 561354+524160+423407=1508921 rows and we saved the scan of the 1st IMCU (491079 rows). We did projection on the 3 IMCUs which return 10 rows each. The 4 IMCUs have been considered for ‘minmax’ and only one has been pruned.
But there is more here. ‘IM scan CUs optimized read’ appeared here. The filter scan is done to get the vector of rows that match the predicate. But from min/max we know that the 3rd and 4th IMCUs have all rows matching because both their min and max are between 500000 and 2000000. This is filter skipping, accounted with ‘IM scan CUs optimized read’
If you look above at the V$IM_COL_CU view we have not only the minimum and maximum value. We have the number of dictionary entries which looks like the number of distinct values. Why is it called like that? Because the distinct values are stored in a dictionary. A guess is that it can be used for dictionary encryption (each value is replaced by a number which is the position of that data item in a dictionary).
But for sure, if that dictionary covers all possible values, then some equality filtering can be done from there. Let’s look if we can do IMCU pruning for a query that select no rows:
SQL> select distinct ten from DEMO where where ten=10; ... no rows selected.
SQL> select name,value from v$mystat join v$statname using(statistic#) where ( name like 'IM%row%' or name like 'IM%block%' ) and value>0; NAME VALUE ---------------------- ---------- IM scan rows 10000000 +2000000 IM scan rows valid 8070275 +2000000 IM scan rows optimized 1929725 IM scan rows projected 2000080 IM scan CUs pruned 8 +4 IM scan segments minmax 12 +4
The 4 IMCUs were candidate for ‘minmax’ but no pruning occured. From those statistics, it seems that the dictionary entries are not used for pruning.