Document dependencies, traceability and Impact Analysis – tales from the front line…

Well, it’s been a very busy last few months. Between getting our 1.2.1 release out and working closely with our key accounts it’s been full on.

So, I’ve come up for air in the last few days and I wanted to take the opportunity to share some thoughts relating to medium to large programs in IT and some challenges I’ve being seeing at first hand.

Many of the people we work with are involved in programs typically coordinating between 5 to 15 projects. These roles are variously referred to as ‘program managers’ or sometimes ‘program PMOs’ (Program/Project Management Offices). They look to VisibleThread to help assess the quality of documents for these initiatives.

For instance, one of the large programs comprises 15 separate projects spanning 3 separate locations globally. In this case, we have 15 vision statements, 15 BRDs (Business Requirements Definitions) and will in due course have multiples of 15 for other types of documents including FRDs, Arch specs, Test Plans etc.

This program may well ultimately yield 15 x 5 (1 for each type) documents, i.e. 75 separate docs. If we assume that each doc averages 24 pages, we’re looking at 1800 pages of content. This is, believe it or not, roughly the size of the King James bible! (well at least this edition: http://www.christianbook.com/21st-century-king-james-bible-hardcover/9780963051233/pd/53495 )

Consider this from an individual author’s perspective. A project manager authoring her own project BRD (assuming it’s one of the 15) will likely be conscious of the considerations involved in that specific project but less conscious of sibling document sets for adjacent projects that are part of the program.

The challenge of assessing risk within the content is obvious; different authors, different audiences, different levels of detail, potentially large volumes of content making gap analysis and interdependency analysis very difficult indeed.

Working with people on the ground, this has been one of the biggest issues at the overall PMO or Program Management role level. One customer mentioned an example where the far east team had a definition of ‘client’ that was subtly different from the US version. Unraveling this inter-dependency did not occur until into the UAT phase and the consequent fallout was material in the context of the overall program, resulting in cost due to rework and a significant delay in the delivery timeline.

Dependencies typically go in two directions, horizontally across sibling documents (eg: Proj ‘A’ BRD relating to Proj ‘B’ BRD) and vertically, down to different document types (eg: Project ‘A’ Vision doc relating to Project ‘A’ FRD) and different sections in these docs. These are clearly non-trivial challenges and very difficult in the context of medium to large program initiatives.

Can’t we use a Traceability Matrix?

Many people within the business analysis community tend to suggest using  a Trace Matrix as one way to tackle these issues. A fairly simple image of such a matrix (taken from wikipedia) is shown to the right. Effectively a trace matrix seeks to identify the connections between numbered requirements.

In my experience, trying to use a trace matrix for the above issue is near to impossible. Three factors make this so: 1.) the effort to create and maintain an up to date trace matrix in light of changing underlying document content is unduly onerous, 2.) a trace matrix is not scalable visually beyond the simplest of initiatives, certainly not with 1800 pages of content and finally and perhaps most importantly 3.) expecting senior stakeholders such as program managers to go to the effort of maintaining such a trace matrix is ‘a bridge too far’ in most organisations and will simply not happen.

So, enter the ‘discovery’ view within VisibleThread. Discovery takes a different approach. Rather than forcing an explicit marking of ‘traceable’ relationships, it instead automatically scans a body of documents and calculates the occurrence of ‘nouns’ in the context of where they occur within document content.

How does this work?

The VT Server uses NLP (natural language processing) to automatically scan raw documents and store occurrences. The dashboard then displays these occurrences cross referencing the document content.

Let’s say we are looking at documents concerning an extension to an existing trading system that will affect business rules around how accounts are validated for trading thresholds. In this case you would expect to see a reasonable distribution of certain key domain concepts like ‘trade’, ‘dealer’, ‘letter of credit’, ‘account’ etc. spread across certain documents in certain distribution frequencies. In particular I would expect to see quite a bit of content and rules relating to ‘account’. The BRD, FRD and use cases that touch account manipulation eg: ‘Update Account’ and associated test cases would all be expected to have relatively high occurrences of the concept ‘account’.

Looking at this screenshot, we see the list of ‘discovered’ nouns.  The grid in the center shows the correlation between concepts (nouns) like ‘account’ and in what documents they occur. Each document is represented by a column adjacent to the checkbox. Here we see 5 columns representing; a BRD doc, a tech spec doc and 3 use case docs.

In our case, only 2 occurrences of ‘account’ are encountered indicated by a green dot in the specific document column. Those occurrences are *only* in the BRD doc meaning we have no reference to ‘account’ in the detailed tech spec doc or in any of the use case docs, a potentially serious gap. We can spot gaps/issues without having to trudge through the entire set of documents in this way. For program managers at high levels, this ‘surgical’ view focusing on inherently important concepts top down, means risky gaps and defects can be identified and eliminated.

Were these types of gaps to remain undiscovered in our ‘bible’, (as they were in the above ‘client’ example) we would be in for a rude awakening downstream. Thankfully, with VisibleThread we are seeing how we can avoid these issues in very practical and intuitive ways, marrying NLP with strong visualisation techniques.

all the best,


PS: Over the years I have encountered some insightful people attempting to use mind maps to understand dependencies relationships. It is certainly a good idea on the surface but just like the trace matrix challenge, the mind map itself (albeit electronic in nature) must be separately maintained outside of the content. Just like trace matrices, this is hard to do when the body of content becomes in any way large.

If you want to try VisibleThread Docs sign up here for a 7-day free trial