NPI: TaxonomySession

When working with the new taxonomy support in SharePoint 2010, the TaxonomySession class is high in the food-chain. The taxonomy functionality is provided through the new Service Application architecture and a central piece in accessing these is the SPServiceContext class.

When looking at the constructors of the TaxonomySession, it’s a complete mystery why one can only create one based on an SPSite and not SPServiceContext; especially when there is an internal constructor that takes a SPServiceContext.

For most use this may not be a problem, but think about deployment. During the deployment process you may want to populate a number of term sets because they will be used by site collections; you cannot create the term set unless you have a site that depends on the term set. Sure you could get around this by creating a dummy site collection, but why should you jump through these hoops?

Does anybody have a rational explanation why the internal constructor isn’t exposed? This is just one out of many examples on how SharePoint forces a certain way of programming on developers.

Non programmable interfaces

As you start digging into SharePoint, you will often find that what you really want access to isn’t available. SharePoint relies on a lot of internal classes and API’s. This wouldn’t be that bad if everybody had to play using the same rules, but there are two levels of SharePoint developers; those that work for Microsoft, and those that don’t.

If you look at the Microsoft.SharePoint.dll using reflector, a lot of SharePoint DLL’s are permitted to access these internal types.

Don’t be surprised if something is available through a standard page, but not through an API; aka NPI’s.

Sites vs Site Collections

When building a collaboration system, you will soon face a choice between using sites or site collections. In general I tend to favor multiple site collections. i.e. each project or department has their own site collection.

Storage Quotas

Most organizations don’t have an unlimited amount of storage. Keeping the data growth under control is important. In SharePoint it’s possible to impose a quota at the site collection level, but not at the site level. If you chose one big site collection, everybody would be dipping into the same pool. When a site collection is approaching it’s quota, the site collection owners will receive email notifications.

Security

In SharePoint, security groups are defined at the site collection level. i.e. visitors, members and owners. Using seperate site collections, it’s easy to keep track of all the groups and you know they are all related to the content stored in this site collection.

By using multiple sites in the same site collection, you would have to start breaking security inheritance at various levels and adding additional groups. i.e. Project 1 members, Project 2 members, etc. If you have a couple of hundred projects, you end up with several hundred SharePoint groups as well; keeping tabs on what people can do at various levels becomes tricky.  When a site is deleted, the groups supporting it should also be removed; custom code is required.

If information sharing is important for your organization, some additonal steps need to be done if you use multiple site collections. One option is to give everyone read permission through a web application policy. The downside is if a site collection wants to hide its information; i.e. they are working on market sensitive information such as a corporate merger. A different approach is to have a system that synchronizes all of the visitor groups groups; e.g. using  a timer job. Each site collection should then have a mechanism that allows them to opt out of this synchronization.

Content databases

A SharePoint web application can have multiple content databases; each content database equals one SQL database. This allows the amount of data to scale and at the same time keep the SQL databases at a size manageble for disaster recovery. Using administration tools, it’s possible to move a site collection between content databases; something that is not possible for a single site. This functionality can be important in balancing your load across several databases.

Navigation

One area where multiple site collections become a challenge is navigation. For sites within a site collection, sub sites can automatically appear in the navigation controls. These entries are only shown if the current user has access to them; security trimming.

Since site collections don’t have a hierachy and they don’t know about each other, there is not automatic support for cross site collection navigation. You will have to come up with your own system. i.e. using the object model to add entries to the navigation system, or read the navigation configuration from some other location.

Site queries

One web part that can be very handy is the Content Query Web Part.  This allows you to perform a query that rolls up content from your site. Keeping the number of sites within a site collection low, these types of queries have an easier time.

Provisioning challenges

If you go for the multiple site collection approach, additional questions need answering.

  • Who should be able to create new site collections?
  • Should new site collections require approval? If so, by whom?
  • A site collection will have a lifespan. How does one determine the end and what does one do then?
  • Should the requestor become a site collection administrator? This allows them to do anything, including screwing up the whole site collection. This level of access may only be acceptable for certain individuals?

Environments

Now that you have decided that SharePoint is the solution to all your troubles, you have to get some hardware to build your magnificent system on. The number of environments will not surprise anyone with a certain level of software development experience, but the number of machines needed may surprise you. With each machine, the amount of storage needed also increases, so be prepared.

Development

SharePoint developers perform the following ritual a significant number of times during a day.

  1. Compile code
  2. Retract old code
  3. Recycle the web server process
  4. Deploy new code
  5. Test our your change.

Between step 2 and 4 the system is unavailable. Steps 1 to 4 are done automatically by VS2010.

Each developer needs their own machine, trying to share one between several developers will not be practical. Since the web server process gets recycled all the time, it can take a considerable amount of time to load all your code so that you can test your changes. Once I was able to start a page refresh, go for a slash, and still have time to get comfortable before the page had loaded. It might be worth investing in some fast disk drives….

A developer machine should contain the following:

  • SharePoint configured in farm mode, not single server
  • SQL Server
  • Active Directory domain

Since developers need to be the masters of their universe, putting all this into an isolated virtual machine is a practical approach. The active directory requirement becomes more important with the number of developers you have. With isolated domains, all the machines can have the same name and belong to the same domain. This ensures a common deployment experience for all developers; no need to create additional configuration files when adding more developers(machines)

As for hardware, I’ve tried various approaches. My favorite is actually my 8GB laptop. This I can take with me and work on wherever I may be. I have also tried the approach of logging into a virtual development server. In the environment I was using, I didn’t feel like it was much faster than my laptop. This was also because the virtual hosting environment was hosting all sorts of other things; it was overloaded to say the least. If you manage to have a dedicated hosting environment, it may work out really well. But for now my laptop rules.

Testing

The testing environment is the first environment where we are starting to resemble the actual physical architecture. The test environment should be deployed across multiple machines; 1 SQL server,1  application server, 2 web front end server.

Since this is the first place where we are using multiple machines, problems not discovered on the developer machines will surface. It’s important that the developers have full access to the environment so they can troubleshoot.

If you manage to get your code working after it has moved between environments, things are looking up. At least you then know that nobody has hard coded the developer URL’s. This may sound stupid, but I have seen it happen. ‘Works on machine’ takes on a whole new meaning.

Staging

This is the environment where final testing and approval takes place before deploying the code to production. In this environment developers should have very limited control. The people maintaining the production environment should be the ones in charge of this environment. The test environment was the playground for the developers, the staging environment is the playground for the operational staff. Service pack and patch management testing can be performed in this environment.

In an ideal world the staging environment should be a copy of production. How that looks will depend heavily on what parts of SharePoint you are using. Since we don’t live in a perfect world, it’s not going to happen very often. Reducing the amount of allocated storage is an easy way to reduce cost; you need to be able to store a certain amount of data for testing purposes, but probably not the entire production system. The servers will most likely be virtualized, so you can turn them off when they are not in use.

Production

This is just included for completness; shouldn’t require much explanation.

Multiple release development

Now that you have established the environments for your current release, time to think about the next version. Developing and testing a new release takes time, and during that time, issues may arise in your production environment. In order to perform bug-fixing in production, you shouldn’t use the test and staging environment for developing the next release. Separate environments should be established.

If you are a gambling man, you can hope nothing happens to your current production environment as long as development takes place on your next version.

Code vs XML

As a SharePoint developer you can very often chose between code and XML. In my early days as a SharePoint developer, I favored XML, but as I have grown older and wiser, my preference has shifted to doing as much as possible through code. There are two things that you don’t have in XML:

  • Conditional statements
  • Exception handling

From code it’s very easy to check if a list, view, user or whatever you are creating already exists before you try to create it. If your code can handle that something may already exist, the road to creating upgradeable code is very short.

At some point, something will fail. Keeping as much as possible in code makes exception handling easier. In order to understand this we must first see how features work.

Features generally contain one or more XML elements for SharePoint to process. Features may also have a feature receiver that can perform whatever you code it to do. During feature activation, the XML files are processed first, then your feature receiver is called.

If the feature receiver failed, the XML processing might have succeeded. The net result is that some things may remain in the system even though the feature didn’t fully activate. You may correct your error and try again, but then the XML processing may fail since there are some remnants from the previous attempt. Using a pure code approach allows you to adapt to a lot of different scenarios.

There are some things that you cannot do from code; i.e. List Definitions. For those cases I try to split functionality across features and keeping each one as pure as possible.

Feature guidelines

A feature is basically a set of files stored in a folder. Each feature is scoped at a certain level (Farm, Web Application, Site Collection, Site) and can be activated or deactivated on demand. Internally features are referenced using a GUID, but on the file system a more human name is available.  The challenge is that all features are stored in the same folder, so the name is globally unique.

If you wanted to create some Administration links for your solution, you couldn’t call it ‘AdminLinks’ since Microsoft already has taken that name. Creating a feature with a generic name is not a good idea either because somebody else might also be using that name.

The way to get around this is to introduce a naming convention for your features. If you use Visual Studio 2010, the default name of features is <wsp file>_<feature name>. My personal preference is to use a three part dot notation for most cases.

<system>.<area>.<feature>

For each of the parts I have further conventions.

<system>

This represents the project the code is being developed for. If you are developing  a system for The Ministry of Silly Walks, the abbreviation MSW could be appropriate.

<area>

This represents the various functional areas.

COL = Collaboration
PUB = Publishing
SEA = Search
REC = Record Center
MYS = My Site

If left empty it’s applicable across multiple areas.

<feature>

Finally we come to the name of the feature. This should be something that illustrates the function of the feature. Here there are also certain names that I reserve for specific use.

Farm: Things to do at the farm level
WebApplication: Things to do at the web application level
Host: Things to do at the root site collection level (inspired by MySite Host)
Styling: Provisions the files required for visual styling/branding.

Examples:

MSW.COL.WebApplication
MSW.COL.Host
MSW.COL.DocumentLibrary : Customized collaboration document  library.

Experiencing name collisions using such an approach should be rare.

Once the name has been established, what other configuration changes should one consider?

Deployment Path is where you enter your custom name based on an appropriate naming convention.

Title and Description are always useful to fill out.

Activate on default is only applicable to features scoped to farm and web application. If this is active on a web application scoped feature, every web application that is created after this feature was deployed will get this feature activated. If you are writing code that is just so cool that everybody just has to have it, then go ahead and keep it active. If you like to give users the choice of activating your functionality, then let them explicitly activate it. They may of course deactivate it afterwards, but then the damage may already have been done.

IsHidden determines what features to show in the feature list UI. Features cannot be scoped to a single application; they are globally available. Features that are designed for a specific environment, i.e. My Site are available to team sites, search centers and anything else. Try to hide as many features as possible to reduce the potential for confusion. If a feature will always be activated as a result of site provisioning, there is very little point in making it visible to end users.

Version is a new attribute in 2010. Features can now be upgraded from one version to another. If nothing is set, the system will return version 0.0.0.0. Configuring an initial version just makes things clearer; it shows that you have made a conscious decision about it.

 

You may disagree with me on several of the points raised here, and that is fine as long as you have at lease given it some though.

Packaging

When it comes to SharePoint, there is just one way of packaging up your code; SharePoint solutions (WSP files). For 2010 the developer support in Visual Studio makes this a reasonably effortless exercise. For 2007 you have to use third party tool such as WSP Builder or roll out your own system; this lead people to use all sorts of fun combinations of MSI, XCOPY and CMD files.

For 2010 there is absolutely no excuse for not using WSP’s. If you are considering a third party vendor that is not using WSP’s, give them a pass and move on to the next candidate. This may sound overly harsh to some, but there are some very good reasons for this view.

Using WSP’s you get the following benefits:

  • A standardized way of deploying code to your system.
  • A container for all your customizations.
  • The customizations are applied to all the servers in your farm.
  • If you add new machines to the farm at a later time, the customizations are pushed out to those.
  • It’s easy to remove your customizations from all the servers in the farm.

For a developer it may not make much of a difference, but for the people maintaining the system it sure does.

Now that you know why it’s a good idea, how do you structure them? The following information is not targeted at those developing products, but rather those that are creating SharePoint systems containing customizations across several of the functional areas.

In Visual Studio 2010, one SharePoint Solution equals one Visual Studio Project; not Visual Studio Solution.

I prefer starting with the Empty SharePoint project and build on that.

My initial breakdown is as follows:

  • One WSP for common functionality
  • One WSP for each functional area in the system
    Search, Publishing, Collaboration, Record Center, My Site, etc.
  • One WSP for each system integration
    HR, Third party Document/Record Management system, CRM, ERP, etc.

Collaboration is a very generic term, so in some cases it might be appropriate to split this into several WSP files.

The idea behind this breakdown is the ability to deploy updates to parts of the system without affecting the others. Search and My Site may have very different change rates. Dependencies between the packages should be limited as much as possible. Ideally they should only depend on the common package. For the integration packages, it makes sense if they depend on relevant packages. I.e. dependencies between an HR package and My Site may be reasonable.