Data Warehousing Services
Data Warehouse Architecture
Creating data architecture and warehouse models requires specialist knowledge of warehousing design techniques. The underlying design of a data warehouse and how to plan for the different relationships between data affect not only the storage capacity of the warehouse but also its efficiency at retrieving data and providing correct, speedy information. Warehouses often follow the 'star schema' model which uses dimensional modelling techniques, such as those conceived and authored by Dr. Ralph Kimball. His methodologies, generally perceived as the warehousing standard, have helped Clare Nicklin design and implement many warehouses and data marts for blue chip companies. In her previous role, Clare delivered 4 live warehouses in 3 years, including business intelligence layers and OLAP.
Data warehouses are often derived from multiple data sources, which then need to be integrated into a single set of information. In order to do this, data developers need to specify how to access the source systems i.e. operational databases, extract the required data, transform the data into a useable format (removing data redundancy or errors), load the transformed data into the warehouse model and ensure it is formatted in the correct way. This data integration process, often referred to as Extract, Transform and Load (ETL), is a massive part of any warehouse development which requires in depth knowledge of the dynamics of data.
When several different sources of data are being used, there is potential to duplicate data records. Within every ETL process for data integration, data must be assessed for potential duplication and redundancy. If such a problem exists, complicated de-duplication routines are necessary to uphold the quality of the data. Data is also often riddled with blanks which may need special treatment or accidentally contains incorrect values that are outside of a certain acceptable range and need correcting. All such issues need addressing before data can be drawn from a fully functional data warehouse, for example to ensure BI statistics are not miscounted or contacts being sent marketing mailshots are not mailed twice.
If a data warehouse contains any kind of contact information, there is likely to be a policy controlling how the data is used and who it may be released to. Beyond company policies, legal requirements also exist such as the Data Protection Act to ensure privacy and the protection of sensitive data.
Special security systems are often required to secure sensitive data and personal information. Data warehouses and their access/reporting systems need to be regulated and controlled. Individuals should be granted access singularly or as a member of a permission group. It is likely that different data users will need different access levels to different data warehouses (or different areas within a single warehouse). Permissions systems can become complicated and need a lot of advanced planning, so methods should be put in place to ensure permissions can be created and maintained routinely through metadata.
Clare Nicklin has been involved in the management of data warehousing projects at every stage of the project lifecycle from conception to team employment to product release to maintenance and support. Effective project management must include effective project planning to agree project scope with the stakeholders, breakdown the project into necessary steps and milestones and assign timelines and deadlines. Other relevant experience in the project management of MI systems includes the defintion and implementation of stringent change control and version control methods, including the regular backup of warehouse metadata.