the data warehouse
The Data Warehouse Technology
A database that has the same characteristics as a data warehouse is usually smaller and is focused on the data for one division or one workgroup within an enterprise.
There are three different (and somewhat contradictory) views of the place of the data mart in the world of data warehousing.
1. The data warehouse gathers all the information from the various legacy systems. Specialized data marts are then created with a subset of the information in the data warehouse. These data marts are easier to use because they only have the particular information the specific user group needs. The use of several data marts also allows the querying load to be spread among several different computers. This can reduce network traffic.
2. Free-standing data marts are created, independent from a data warehouse. The information for the data mart probably comes from just one legacy system. It is quicker and cheaper to build a separate data mart instead of building an enterprise-wide data warehouse with data marts derived from it.
3. The data mart is the prototype or the first step of a data warehousing process. An enterprise picks the division or group that would most benefit from data-based knowledge. A data mart is built with that group's data. Additional types of information are added to the data mart as time goes on until it is turned into a data warehouse.
The movement of data from one environment to another.
This happens when data is brought from a legacy system into a data warehouse.
The process of finding hidden patterns and relationships in the data.
Analyzing data involves the recognition of significant patterns. Human analysts can see patterns in small data sets. Specialized data mining tools are able to find patterns in large amounts of data. These tools are also able to analyze significant relationships that exist only when several dimensions are viewed at the same time.
Users can ask data questions using standard queries when they know what they're looking for. Queries can be written for questions like this: "Which of our out-of-town customers have given us the most business in the last year?"
Data mining is needed when the user's questions are more vague or general in nature. Data mining questions would include: "What attributes characterize the customers that gave us the most business in the past year?"
Data quality assurance is one of the greatest challenges in the process of data warehousing. If the data-based knowledge generated by the data warehouse is to be trusted, the data entered into the warehouse must be complete and accurate - "garbage in, garbage out".
A significant portion of time in the development process should be set aside for setting up the data quality assurance process and implementing whatever data cleansing is needed..
In a production environment, there should be a data quality report generated after each data warehouse import. There should be provision for rolling back an import if data quality testing indicates that the data is unacceptable.
The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouse and data marts
Whether we're building a data warehouse, a data mart, or both, we are taking part in a complex, on-going process. The emphasis in the data-based knowledge business needs to be kept on the process. That's why you're reading a glossary of "data warehousing terminology" instead of a glossary of "data warehouse terminology".
There are many steps in the data warehousing process -
Consider also all the actions that take place as a part of the data warehousing process -
One of the perspectives that can be used to analyze the data in an OLAP cube.
When you are browsing the data in a cube, you can view the data from the perspective of different combinations of dimensions.
For a Sales database, the dimensions could include Product, Time, Store, and Promotion.
Dimensions contain one or more hierarchies, which have levels for drilling up and drilling down in the the cube. When a dimension has just one hierarchy (which is quite common ), people often refer to the dimension itself having levels.
In a start schema, a table which contains the data for one of the cube's dimensions.
The dimension table has a primary key which is used to connect it to the fact table.
The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension's levels.
The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension.
If there are multiple hierarchies in the dimension, there is one level field for each distinct level in each of the hierarchies. If the hierarchies share some levels in common, they are represented by a single field For Calendar and Fiscal hierarchies in a Time dimension, the level fields could be Fiscal Year, Calendar Year, Fiscal Quarter, Calendar Quarter, Month, and Day.
For the Product dimension table, some of the attribute fields could include Description, Product Number, Product Type, Department, Package Size, Weight, Shelf Length, etc.
The dimension tables in a star schema are intentionally de-normalized. The level fields and the attribute fields contain data that is duplicated in many of the records. This normally does not add a significant amount to the amount of storage space needed in the database, because the overall size of each dimension table is very small when compared to the size of the fact table.
Drill Down and Drill Up
The ability to move between levels of the hierarchy when viewing data with an OLAP browser.
Drill Down - Changing the view of the data to a greater level of detail.
Drill Up - Changing the view of the data to a higher level of aggregation.
Multidimensional analysis (OLAP) tools organize the data in two primary ways: in multiple dimensions and in hierarchies.
Drilling down and drilling up allow an analyst to move down and up the hierarchies to see how the information at the various levels is related. After looking at the sales totals for a store's departments, the analyst may want to drill down to see the individual sales for each employee in one of the departments. Then the analyst may choose to drill up to view how this store's total sales compare to other stores in the same region.
DTS (Data Transformation Services)
An ETL tool provided as a part of Microsoft SQL Server.
DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications.
