Data Warehousing and Reporting
Replication is often used in data warehousing and reporting applications to:
- Consolidate data so it can be transformed and moved into the data warehousing environment.
- Distribute data to read-only databases for reporting.
- Distribute data to an online analytical processing (OLAP) database.
Although replication does not replicate Microsoft SQL Server 2008 Analysis Services (SSAS) objects (for example, dimensions or cubes), it is often used to distribute data from online transaction processing (OLTP) databases to staging databases and databases that are used for reporting, decision support and analysis purposes.
The following diagram shows a typical scenario, with data replicated from an online processing server to both a reporting server and a staging server for OLAP and ROLAP analysis.
Adventure Works Cycles Example
Adventure Works Cycles is a fictional manufacturing company used to demonstrate database concepts and scenarios.
Adventure Works Cycles uses data warehousing and reporting in a number of departments, including Manufacturing and Human Resources.
Manufacturing stores historical data on manufacturing defects and a number of other quality and performance metrics.
Data is replicated from servers at the manufacturing facility to a staging server at the company headquarters. From there the data is transformed and loaded into OLAP cubes for analysis.
Human Resources currently produces reports using a third-party application. They plan to replace this application with Reporting Services. They would also like to expand their reporting capabilities and add the ability to perform these types of analysis:
- Compensation and benefits analysis, including analyzing the impact of international currency exchange rates.
- Headcount planning.
- Payroll cost simulations and forecasting.
They will bring a new server online to handle the increased need for reports through the company. Data will be replicated from Human Resources and other departments to this central read-only reporting server.
Common Requirements for This Scenario
Data warehousing and reporting applications typically have the following requirements, which an appropriate replication solution must address:
- The system must maintain transactional consistency.
- The system should have low latency: updates at the online processing server should reach the staging and reporting servers quickly.
- The system should have high throughput: it should handle the replication of a large number of transactions.
- Replication processing should require minimal overhead on the online processing server.
- Data changes flow in one direction, from the online processing server to the staging and reporting servers.
- The data required at the staging and reporting servers might be a subset of the data available at the online processing server.
The Type of Replication to Use for This Scenario
SQL Server uses a publishing industry metaphor to describe the components of the replication system. The components include the Publisher, Subscribers, publications and articles, and subscriptions.
In the diagram above, the online processing server is the Publisher. Some or all of the data at the online processing server is included in two publications (one for staging and one for reporting), with each table of data being an article (articles can also be other database objects, such as stored procedures). The staging server and reporting server are Subscribers to one of the publications, with each server receiving schema and data as a subscription. SQL Server offers different types of replication for different application requirements: snapshot replication, transactional replication, and merge replication. This scenario is best implemented with transactional replication, which is well suited to handle the requirements outlined in the previous section.
By design, transactional replication addresses the principal requirements for this scenario:
- Transactional consistency
- Low latency
- High Throughput
- Minimal overhead
The primary option to consider for this scenario is filtering. Transactional replication allows you to filter columns and rows, so the tables at the staging and reporting servers contain only the data required by your application.
Integrating Data from Multiple Sites (Server)
Many companies have regional offices or entities that collect and process data that must be sent to a central location.
- Inventory data can be “rolled up” or consolidated from a number of servers at local warehouses into a central server at corporate headquarters.
- Information from autonomous business divisions within a company can be sent to a central server.
- Order processing information from dispersed locations can be consolidated.
In some cases, data is also sent from the central site to remote sites. This data is typically intended to be read-only data at the remote site, such as a set of product inventory tables that are only updated at a central site.
The following diagram shows a typical scenario, in which data is rolled up from remote sites. Read-only data is also sent to each remote site.