Friday, February 03, 2006

Best Practices for SQL Design Patterns

I’m a firm believer in creating symmetry through design patterns where it is applicable. Using good modeling techniques and naming conventions on data and process definitions will enable easier code generation. 80% of an application can be code generated based on templates that enforce design patterns. The remaining 20% may need to vary a little from the standards when it is too “costly” to do so. The following naming and coding suggestions have proved to be the most practical for me in starting any project.

Table Name convention In the past I‘ve seen a several databases that had some difficult naming conventions that increased the cost of ownership due to confusion. The databases that have followed the below suggestions have lowered the cost of ownership by making it easy to understand.

Pascal naming convention: The first letter of every word is capitalized.

Good Example: Person, PersonAddress, GBITax
Bad Example: personaddress, Person_Address, personAddress, tbl_PersonAddress

Do not pluralize the table name: It is best not to place an 'S' at the end of every table. It can be assumed that a table contains many entries.

Good Example: Person, Country, UserSignup, DetailRecord
Bad Example: Persons, DetailRecords

General Rules for placing Prefix to tables: Prefixes can be useful for classifying tables into business areas when the number of tables becomes too unwieldy to deal with in a single database. Usually one should start asking the question of whether or not they should split the database into multiple databases when this starts to happen, but there are cases when this may not be the desired direction. In this case you should start thinking about Prefixes. You can use abbreviations also long as they are consistent.

Good Example: CatalogItem, UtilMessageLog, PromoRule
Bad Example: tblPerson

Do not use the ‘_’ as a spacer: From a developers and analyst standpoint these little guys can be quite annoying. And with the Pascal naming convention the use of ‘_’ for readability becomes redundant.

Column Name convention 
Pascal naming convention: The first letter of every word is capitalized.

Good Example: PersonID, FirstName, ZipCode, SSN
Bad Example: Personid, Firstname, lastName

Do not place the name of the data type in column name: It is not a good idea to place the name of the data type in the column name. If you change the data type of the column, then your column name is misleading. And changing the name of a column is full of troubling changes. There are a few exceptions: Date is one example.

Good Example: StartBatchDate, FirstName, USDAmount
Bad Example: ti_event_id, vcFirstName

Usage of Abbreviations: Abbreviations are good for reducing the size of a column name. One should always be consistent and only use abbreviations that are commonly understood. Remember that the name of the column should be self-descriptive. Avoid over-use of abbreviations which can make it difficult to understand and support. When in doubt spell it out.

Good Example: StartBatchDate, UserNum, SubscriptionDesc
Bad Example: ti_event_id, vcFirstName, SbDc, X

Schema Structure
Use Constraints:
It is sad to say that there are many databases that exist in this world that do not use primary key (PK), foreign key (FK), and alternate key (AK) constraints. Unless there are technical reasons for not having these, you should always use them, if for nothing else, to insure referential integrity. The use of data integrity constraints also makes the database design self documenting (Example: Schema extraction process into Visio or Erwin.)

Name Your Indexes: If you name your indexes instead of having SQL Server to generate the name dynamically, you will have an easier time altering those indexes in the future. You will then explicitly know the name of the index rather then having to dynamically lookup the index name to alter it. If you dynamically generate the name of the index, it will be different from environment to environment (Example Environments: Development, Test, & Production).

Stored Procedure Naming Convention
Name Format: Don’t use SP as a prefix for your stored procedures. SP in SQL Server stands for System Procedure (Example: SP_HelpText). {VERIFY} Also, if you use the SP prefix, SQL Server will check the master database first and then the application database. So it is also a performance issue as well.

Good Example: GetPerson, ImportProductKey, CountPIDs, CommitPayments, GetProductReport
Bad Example: SP_GetPerson

Pascal naming convention: The first letter of every word is capitalized.

Good Example: GetPerson, ImportProductKey, CountPIDs, CommitPayments
Bad Example: Get_Person, IProductKey, COUNTPIDS

Examples of names: Get; Set; Ins (Insert); Upd (Update); Del (Delete); Import; Extract; Load; Transform; Export; Count; Process; Commit;

Abbreviation / Acronym dictionary: You should have a list of the most commonly used abbreviations. It’s not a hard and fast rule that a person can’t use an abbreviation that is not on the list. It is just a means to help create symmetry into the system. This list should be in a database design / development standards document.

Examples:
Abbreviation NameNum - Number
Ind - Indicator
Dtm - Date and Time (Millisecond)
Id - Identifier
Cnt - Count
Amt - Amount
Acct - Account
USD - United States Dollar Currency Code

Stored Procedure Argument Naming Convention
This is where coding preferences really start to vary from developer to developer. The concepts here are just helpful tips that I have found very useful as a SQL Developer.

Argument Prefix: @Arg @Arg. Looking through a complex stored procedure and attempting to quickly find where the stored procedure arguments are being used can be time consuming. The use of @Arg Prefix can ease this issue. I’ve found myself avoiding lots of coding mistakes by having done so.

Example: GetPerson(@ArgPersonID int)

Coding Stored Procedures
The Return statement: Avoided using the return statement to return any values other then error codes if even that. One should use Output parameters where possible for returning values. (This is for stored procedures. Functions are a different beast all together.)

Set NoRowCount On: Place Set NoRowCount On at the beginning of every stored procedure to reduce the noise coming back to Query Analyzer.

Error Trapping: Always check for errors at every statement and log these errors to a MessageLog table. Every stored procedure must do this.

Transactions: K.I.S.S. is the general rule here. Keeping it simple will avoid some of the pitfalls many fall into with complex rollback designs. Always think of a unit of work in which several processes must live and die together. If any process within that unit of work fails, then the unit of work should be rolled back. If a unit of work is extensive it would be a good idea to break it up into multiple units of work with each unit of work isolated in its own stored procedure. That stored procedure can have multiple stored procedures calls, but try to keep the design flat so as not to have a stored procedure that calls a stored procedure that calls a stored procedures etc… . Avoid combining units of work into one gigantic transaction. People from all walks of life will walk over to your desk and slap you with a wet noodle.

Avoid Deadlocks: To avoid deadlocks, get in the habit of accessing tables in the same order through out your system. This will alleviate the most common cause of deadlocks.

InLine Code Documentation: Document the “You Know What” out of the stored procedure within the code. No excuses or exceptions here.

Stored Procedure Code Generator: This is a very useful tool in creating fundamental (Basic) insert, update, delete, get, and set stored procedures. I’ve called them fundamental stored procedures as they are generated per table. These stored procedures are mainly used by the UI. I’ve found that generating these stored procedures increases the symmetry and stability and reduces human error and derivation. This kind of tool can be used to generate 80% of all stored procedures where a UI is needed. The other 20% are complex stored procedures that need more LTC (Loving Tender Care) from the developers.

Once you’ve generated these stored procedures, your better off maintaining them manually, Though some generators allow for “Re-Entry” in which manual code changes are kept while regenerating the rest of the stored procedure.

I do have a stored procedure code generator that I’ve created over the years. It needs a little bit of updating, but works fine as-is and can save lots of time and money. It will generate very clean and supportable code. There are many code generators available. I’ve created my own, because I like total control of my tools and I’ve had it before many code generators existed.

9 comments:

Anonymous said...

Nice article. I like the idea of having an abbreviation dictionary.
For stored procedure names, what do you think of using the prefix "usp"? I know it gets redundant but it's better than using "sp".

Anonymous said...

Why in the hell would you use 'usp' as a replacement for 'sp'. Are you serious?

Don't use ANYTHING redundent- thats the point. Stored procedures are clearly stored procedures, and will be used clearly as stored procedures in any dev platform; without the need to add prefixes

Anonymous said...

"usp" is used to differentiate between system and user created stored procedures

Anonymous said...

Interesting suggestions: I have found that over the years I have adopted most of these. Here is where I vary some...
I don't use "arg" for any variable, stored procedure or otherwise. I generally use a 1 or 2 character indication of what I consider the datatype of the variable to represent: @iVar is (obviously) some kind of an integer, just like @rVar is REAL. @vcArg is VARCHAR, and so on.

When I am doing stored procedures, I always add commented lines in the beginning with the procedures parameters declared and SET with the default value. When I have to dip into the procedure to debug, etc. I simply uncomment those lines and proceed.

As to naming stored procedures, I initially started with sp (and sp_) but found it redundant. Now I use something to indicate the purpose of what the procedure is intended for: rpt (report), etl (data collection / loading) are the main ones I have to deal with (I work in a data warehouse and reporting environment so I don't have much to offer beyond these).

Most of the other conventions I already follow (first letter CAP'd, etc). One of the things I have found is that whenever I am designing a warehouse; I reuse the original names from the sources. I tend to collect data from multiple systems with the same table name. I reuse all of the same names (table and column names) no matter what. I can then refer to the "source" documentation to describe the content of what I have. I can "slap" an etl_{table_name} on the ETL procedure, and a rpt_{table_name} on any simple reporting procedure I need.

Anyway, that's my $0.02.

Anonymous said...

For stored procedures, instead of using prefixes, if you do, like "usp_", Get, Set, Ins, etc., what are your thoughts on using suffixes instead, example TableNameGet or TableNameCollection. This way in your object explorer you can view all stored procedures related to a single table group together.

You mentioned code generation, are there any in particular that you like/recommend that are hopefully free :) Found http://www.ssmstoolspack.com/, but source code isn't avaiable, so can't tailor them to what I need exactly.

Anonymous said...

Great article, thanks. Just wondering why the use of underscores are discouraged. I certainly do not find them annoying, but I do agree that the use of them should be limited. The rule I follow is that where an acronym is used an underscore is permissible to separate the capitals of the acronym from the capital of the next word, eg HR_Department

Anonymous said...

Thanks for the article.

I agree, I don't think the use of underscores are or are not a best practice. It is a preference. Devs get to stare at the same stuff day in and day out. We all become a bit "anal" as we know it is important to be efficient. Easier to read is more effecient. A godd developer will be anal and very consistent.

When a best practice narrows developer options, it is not very "best" anymore.

I encourage underscores as a manager as my team finds First_Name easier to read then FirstName, especially in a big application. I also don't agree that plurals are to be aviods. The have a place, Event is much differnet than Events. Also Event, although not a keyword in SQL, has implications to program code. Consistent use of plurals indicates alot to my team.

Important part, best practice is consistency across all of your applications.

Unknown said...

I'm incrementally building our business applications database, project by project. Your article is excellent as I have arrived upon the same best practices almost verbatim, although I plan to add a few changes after reading this. One comment I'd like to make is the possible use of schemas to help organize your database even more. It's great if you are using a single database for multiple applications. I use sensible abbreviations for each project and CREATE SCHEMA [project abbreviation] for them. It's pretty easy to query every database object in sys.objects that relates to that particular project.

cyrusrynlee said...

When you say "best practice," it simply ,means that there are no fixed rules on how to deal with some problem but what's chosen is known to work for someone but may likely fail when adopted by others. Don't feel guilty whenever you go against "best practices" - that is if your actions will benefit your team, reduce the cost of the business operation and at the same time improve the security of your applications.