From an operational prospective, you can improve Analysis Services performance by keeping partition sizes reasonable, setting partition data slices, defining appropriate aggregations on all partitions, and running the Optimize Schema Wizard in the cube editor.
You can divide a cube into multiple partitions if you are using SQL Server 2000 Enterprise Edition (or SQL Server 7.0 Enterprise Edition) to increase query and processing performance. You should consider increasing the number of partitions if a partition file exceeds 5 GB or 20 million records. To determine the size of each partition, review the partition files in the Data folder.
When you partition a cube, you should define the data slice for each partition using the Partition Wizard. The Partition Wizard does not require you to set this data slice when you create a partition. As a result, it is possible (even easy) to create a partition without setting the data slice. In fact if you simply click through the Partition Wizard pages taking the default values, you end up with a partition without a data slice being set. You should verify that each partition of each cube in each Analysis Services database has a data slice defined. The only exception to this is if you have a cube with only one partition-in that case the data slice should not be set, because you want all of the cube data to be placed in that one partition. To verify that a data slice is defined, edit the partition in Analysis Manager and then step through the Partition Wizard pages. Defining the data slice enables Analysis Services to quickly eliminate irrelevant partitions from query processing. The data slice identifies the actual subset of data contained in each partition. Unless Analysis Services knows the range of data contained in each partition, it must query each partition, which negates much of the query performance benefit of partitions. To draw an analogy with SQL Server, creating a partition without a data slice is like creating a partitioned view without the CHECK clause. While you can do it, you force the query optimizer to scan all of the partitions in the view because you haven’t given it enough meta data to figure out what partition to access when a query is issued. While the Analysis Service’s runtime engine does not use a relational query optimizer (it has its own component that accomplishes a similar operation), it uses the data slice in roughly the same way: as meta data to tell it which partitions to scan if an aggregate cannot be
used or is not available. If you partition a cube by month, and have 36 months worth of data (in 36 partitions), and if you don’t specify the data slice, then the runtime engine must scan all 36 partitions to answer a query. If you specify the data slice, it could potentially only have to scan 1/36th the amount data, with an obvious improvement in
performance.. Setting a data slice also causes Analysis Services to add a join and a WHERE clause to the SQL statement used for retrieving data from the source database during processing. The WHERE clause limits the data retrieved by the SQL statement to the data that belongs in the data slice. For example, if you say that a partition’s data slice
is June 2003, then Analysis Services adds a join to the time dimension and adds the WHERE clause: WHERE = ‘June’ AND = ‘2003’
or whatever the appropriate member/level names are. If you do not define a data slice and you have multiple partitions, Analysis Services does not restrict the data that is retrieved from the source database. Without the data slice, if you just happen to have July 2003 data in the June partition. By specifying the data slice, the system can add these JOIN and WHERE clauses that assist in maintaining the integrity of the data. If Analysis Services is loading data for each partition from separate tables in the source database, this may yield some performance benefits. However, you should not disable the automatic generation
of the WHERE clause unless you are totally sure that the relational database partitioning is 100% correct when loading data. You must be 100% certain of the correctness of the data in all partitions of all cubes on the server in order to use this setting safely. Without the WHERE clause for your protection, double-counting data (or many-times counting of data) may occur, which could lead to server crashes if inconsistent data is processed. If you disable the generation of the WHERE clause, you assume all responsibility for enforcing data integrity between the data source and the data slice.
The most effective technique you can use to improve overall query responsiveness (assuming that the Analysis Services computer has sufficient memory and hard disk resources) is to design effective aggregations of fact data in each cube partition. However, too many aggregations will unnecessarily increase processing time without significantly improving query performance. When you use multiple partitions to increase query and processing performance, it is possible to deploy a new partition with no aggregations. While deploying partitions with different aggregation designs is a common optimization technique, deploying a partition with no aggregations generally indicates an error in deployment that can cause performance problems. You should verify that a minimum number of aggregations exist in each partition. You can quickly determine whether aggregations are defined on a partition by looking at the combined sizes of the .agg.flex.data and .agg.rigid.data files for each partition. Designing too many aggregations will slow processing and designing too few aggregations will slow
querying. Ensure that all partitions have a minimum number of aggregations – perhaps 10%.
Running the Optimize Schema tool on a cube eliminates unnecessary joins between dimension tables and fact tables, if certain conditions are met. By default, when you first create a cube, Analysis Services constructs a SQL query against the fact table that is a “N+1”-way join (where N is the number of dimensions). From the join, the Analysis Services query extracts the lowest-level key. From that key, Analysis Services begins the aggregation process. Having a 6-way join is typically not a major performance problem in most modern relational database systems. However, if your cube has 15 or 20 dimensions, the resulting multi-table join may suffer significant performance problems. Regardless of the number of dimensions in your cube, the Analysis Services query to the relational database is resolved faster and the data flows into Analysis Services more quickly during processing if you eliminate some of these joins. Fortunately, there is a common design technique that can greatly help the situation. Many star or snowflake schema designs are constructed in such a way that the foreign key that points from the fact table to the lowest level dimension table is not some random number, but it is also the member key itself. If that is true, then Analysis Services can “optimize away the join” and pull the member key directly from the fact table instead of using a join to the lowest-level dimension table. These conditions are checked for join removal are
- The dimension must be a shared dimension.
- The dimension must have been processed before you optimize the cube schema.
- The member key column for the lowest level of the dimension must contain the keys that relate the fact table and the dimension table, and this must be the only key necessary to relate the fact table to the dimension table.
- The keys in the member key column for the lowest level of the dimension must be unique.
- The lowest level of the dimension must be represented in the cube; that is, the level’s Disabled property must be set to No. The level can be hidden.
If these conditions are met with respect to a dimension used in a cube, and the cube’s schema is optimized using the Optimize Schema command, Analysis Services composes a query that does not contain a join to the dimension table in the database when processing the cube. If these conditions are met for all dimensions in the cube, the Analysis server needs to read only the fact table to process the cube. Processing time reductions often can be substantial when this optimization technique is used. Cube schema optimization applies to all partitions of the cube, whether the partitions are processed independently or as a group. So, as a general rule, after you have designed the schema for a cube, you should run the Optimize Schema command. It removes the joins that meet the foregoing conditions. Next, you should determine which dimensions were not eliminated from the join and then determine how to meet the required conditions to eliminate the dimension table from the join. If you have partitioned your cube and specified the data slice, the dimension table used for the data slice cannot be eliminated. This join is set to protect you so that no additional, non-data slice data is included in the partition. If you do optimize away a dimension, you should be aware that the inner join that you have just eliminated had a side effect that may expose problems with your source data. The inner join to the dimension table eliminates fact table records that do not have matching dimension table records (this is what an inner join will do). This means that when you remove the inner join and start using the fact table member keys, you may start seeing processing errors that you were not getting before. When Analysis Services processes a record in the fact table that does not have a corresponding entry in the appropriate dimension table, Analysis Services generates an error. If you recreate a cube, add a dimension to a cube, or remove and then re-add a dimension, you must rerun the Optimize Schema command to re-optimize the cube. New dimensions are always added unoptimized.