Wednesday, November 21, 2007

How To: Advanced Queryable Compression (Continued from previous post)

Reducing I/O for a processing Pipeline is critical when processing high volume of data. By reducing your I/O you reduce the load on the network, Hard Drives, and CPU memory.

The following is a research paper from Vijayshankar Raman called:
“How to Wring a Table Dry: Entropy of Relations and Querying of Compressed Relations”.

Here is Vijayshankar Raman's website:

Summary Coverage:
•Numeric Column compression
•Skew compression
•Correlating columns within a Tuple
•Incidental Tuple Ordering
•Tuple Correlations
•Delta Coding
•Huffman Code Scan Operations
•Segregated Coding
•Query Short Circuiting
•Results in 8x – 40x compression which still can be query-able without being decompressed

This is a continuation from a previous post:

Tuesday, November 20, 2007

Meta Data VS. Meta Model VS. Data Repository

To clarify some thoughts and clear up some confusion, here is some definitions...

Definition: Meta Data
Information about information. Basically its documentation about information used or contained by various systems. Example: Data Dictionaries & Data Model Diagrams.

This is on excellent thing to do at the beginning of a project, no matter how small a project it is. See my thoughts on data modeling in previous postings.

Pipeline (ETL) processes can also use Meta Data to inform it of what is coming into the pipeline and how to process it.

Definition: Meta Data Model (Meta Model)
A flexable data model schema that can store the definition of the data it contains and may include relationship rules and constraints. This could be as simple name-value pair storage design to an elaberate rule based model such as the system tables (example: sysobjects) that SQL Server uses to store the definition of user defined tables and their constraints.

I have several comments about Meta Data Models in previous postings. And its to this particular definition of which I speak of.

Definition: Data Repository
A storage facility that stores data about data. Used by major corporations to track its many data sources. Its stores the data dictionaries, data models, and data flow. Repositories usually use a Meta Data Model schema design for its storage to organize the data.

Large corporate data warehouses may find this to be useful in keeping all those data dictionaries, diagrams, etc... organized. Usually not necessary for smaller businesses.

Monday, November 05, 2007

Free Code: Stored Procedure Code Generator

The above link provides the following:
You will find that 80% of all stored procedures in a transaction database are all very predictable and can be auto generated. Using a code generator will save you time and create symmetry within your database which increases predictability and stability for the overall system. This will allow you to focus on the other 20% of the stored procedures which are more complex and critical to the system.

I have found that creating stored procedures that create stored procedures was an easy way to do code generation for your database. Creating a script that depended on Visual Studio is just too bulky and slow to load when you want quick results. This solution I can take with me any were without the need of a complex environment.

Feature List:
1. Creates Get Stored Procedures
2. Creates Set Stored Procedures (Performs Insert or Update process depending on conditions)
3. Creates Delete Stored Procedures