Testing Center of Excellence

Many organizations use a Testing Center of Excellence (TCoE) to establish standard processes and procedures, promote best practices, and use common tools to provide high-quality testing services at low cost.  I recently had a chance to think about the use of a TCoE in a US federal agency.  Specifically, defining an operating model for a TCoE that allows it to evolve over time and support an agency’s emerging testing needs.  Here are some thoughts.

First, use a Governance board comprised of stakeholders who represent the customers of the TCoE.  The board preserves and strengthens stakeholder confidence in the TCoE via two complementary routes. One, it educates the TCoE about the needs of the customers.  For example, it may ask the TCoE vendor to staff test engagements with testers who have prior knowledge of a particular domain.  Two, it champions the use of the TCoE within their organization.  This is particularly important when there is resistance to the use of TCoE within the agency.

Second, use a small Governance team focused on the long-term governance tasks distinct from the Execution team focused on the day-to-day testing tasks.  As shown in the graphic, creating two distinct teams allows proper focus in several important areas:

  1. The Governance team takes the lead to “collaborate & establish” standard processes and procedures, best practices, and common tools for the TCoE and makes them available in a centralized repository.  The collaboration is primarily between the two teams and the stakeholders.  If the idea of a TCoE within an agency is new, consider integrating change management principles to ease the transition.
  2. The Execution team “learns and adopts” the artifacts produced in the previous step with active guidance from the Governance team to ensure they know how to tailor them to the specific needs of their customers.
  3. The Execution team “implements and reports” the typical life cycle activities in a test engagement such as plan, prepare, execute, report, and closeout using the outputs produced in step 1 and learned and adopted in step 2.
  4. And finally, the Governance team “measures and improves” TCoE artifacts using results from the TCoE test engagements.  Without this step, the critical feedback loop never occurs leading to ossified practices that don’t address the emerging needs of the agency.

In practice, the Governance team, if any, is usually tasked with producing test templates or managing the document repository which does not add much value to the TCoE. Similarly, testers rarely objectively evaluate their own practices as they are busy testing.  At one of the agencies, we actively sought feedback from our customers about the test engagements, used a Kanban board for common understanding of everyone’s test engagements, held regular briefings to ensure best practices spread quickly within the team, continually evaluated new tools to support emerging needs, and used a “QA of QA” model to evaluate work products produced by our own testing team to ensure standard processes and procedures were followed.  All this without using a separate Governance team.

However, use of a well-planned operating model can help a TCoE overcome some of these challenges in a much more systematic way.  And one more thing: how large a Governance team should be is dependent on many factors, but I think spending up to 5% of the TCoE budget on Governance is a reasonable start.

Hopefully, this post gives you an idea or two to help you with your own TCoE.  Feel free to leave any comments or questions or write to me directly at anish.sharma at bpsconsulting.com.


Recovering Corrupted LoadRunner Results

Mastering a tool like LoadRunner requires familiarity not only with its features and functions, but also its quirks and workarounds.  For example, LoadRunner occasionally runs into a problem while a test scenario is running and any attempt to open the results file throws an error.  We have identified recovery procedures that can save many hours of retesting.  Here are four methods, from simple to complex, that you can try.  Be sure to back up your .lrr folder before attempting any of them.

The first method is to force re-collation of the results.  This can be done by changing “Completed” from 0 to 1 (as needed) in Collate.txt, then invoking the Collate Results controller action.

The second method is to copy the .eve and .map files and log folders from each load generator to the controller .lrr folder, then running Collate Results.

The third method is to manually repair the .lrr file to add missing section separator lines, fix the FullData flag, and add a stop time or scenario duration as needed.

The fourth method is to run the same test for a short duration to record control information, then replacing the results with those from the original run.  This requires saving the results in a separate folder, patching the start time to match the original full-length run, replacing the .eve and .map files and log folder with the original results from the load generators, and copying the data from the original .lrr folder to the new one.  It should now be possible to analyze the results.

We love sharing our expertise in LoadRunner.  If you are interested in detailed procedures, please feel free to contact us using our Contact Form.


Agile: It’s Not Just for Software

It seems Agile software development practices are being applied to everything these days. There is even Agile Parenting! It reminds us of the object-oriented craze in the late 80s and a publication titled “My Cat is Object-Oriented”. So are we going too far, or is Agile good for more than just software?

For large organizations, including federal government agencies, we believe Agile has a lot to offer beyond software development. Let’s look at IT project management, for example. While Agile promotes a shift of emphasis away from documentation, there are still plenty of documents required for large IT projects. These include project charters, acquisition plans, project plans, system security plans, training documents, and much more. Following waterfall methods, these documents would be delivered as near-final drafts for stakeholder review and input. But we have found that an Agile approach with frequent delivery (sprints) and daily stand-up meetings (scrum meetings) works wonders for speeding the delivery of project documents and constantly closing the gap between results and expectations to further the success of the project. Consequently, we have adopted Agile practices for all of our projects, whether they include software development or not. Could Agile practices help you succeed too?


Vendor Independence: Theory and Practice

This topic comes up periodically, usually just before or after introducing a new major commercial off-the-shelf (COTS) product into a solution.  Forward-thinking managers or software architects become concerned about avoiding “vendor lock-in” that could result in higher license costs, increased dependence on obsolete products, or an expensive software migration in the future.  Unfortunately, this is a case where the cure can be worse than the problem.
The knee-jerk response is to write an “isolation layer” that in theory reduces the dependency on vendor-specific application programming interfaces (APIs).  But this merely aggravates the problem: now you are dependent not only on the product (because you inevitably can’t hide the vendor-specific functionality) and you are also dependent on a raft of custom software that comprises the isolation layer.  This custom isolation layer ultimately has the same the characteristics you sought to avoid in the vendor API in regard to future cost and support concerns.  Meanwhile, you doubled your cost and schedule by effectively undertaking a standards-definition effort within the scope of your development project.

In practice, there are at least three better alternatives.  The easiest is to resign yourself to using the proprietary API.  At least this interface has been thought out and refined over the years by the product vendor…generally better that what you can expect out of a system integrator operating on constrained project budgets and schedules in an area that is not their core competency.  If the day comes where you need to change products, only then do you write the isolation layer…but now it translates from the legacy product API to the new product API.  This way you only incur the cost when you have to, and you’ll still spend less time and money because you have two well-defined interfaces to bridge.

An even better approach is to use the natural utility classes in your application to serve as the isolation layer.  You have to write this code anyway, and if written correctly you can isolate the impact to a relatively small set of code.  The third and perhaps best approach is to leverage a standards-based API.  This buys you real isolation, and with a little luck it is something that the vendor or a third-party has already implemented.  The downside is that you may sacrifice some proprietary functionality when using the standards-based API, but you can always extend this approach with one of the other two alternatives if really needed.
Ultimately, a good solution depends on both the right approach and a faithful implementation of the approach.  But beware of integrators selling vendor independence through isolation layers…they are likely just moving the dependency, not eliminating it.

LoadRunner and Use of WinInet Option

By default, LoadRunner uses its own sockets implementation, which is very robust and allows use of hundreds of virtual users per load generator during a test.  However, under certain conditions the sockets implementation does not seem to work. Or so it seems.

During one of our client engagements, recording of a script using Web (HTTP/HTML) protocol proceeded smoothly.  However, its replay failed.  The target URL failing was using https protocol.  The error recorded in Vugen was step-download timeout exceeded which was set to the default of 120 secs.  This is a common error that typically means something went wrong and the server did not return a response to the client’s request within the timeout value.  The URL in question simply fetched a WSDL.  All major browsers (IE, Firefox and Chrome) on that machine were able to successfully retrieve the WSDL without any issues.  Turning on Advanced trace and Print SSL options in VuGen did not provide any clues.  It seemed like LR was making the request but nothing much was being returned from the server.

Turning on the WinInet option in run-time settings of VuGen immediately made the output window of VuGen a beehive of activity. The server responded immediately.  This option essentially allows LoadRunner to use Internet Explorer’s HTTP implementation under the covers.  Using this option, everything that IE can do is made available to LoadRunner.  Unfortunately, the use of WinInet option comes with a penalty.  According to LoadRunner documentation (v.11.51), this option is not as scalable as the default sockets implementation.  HP product support indicated that if using WinInet option allows script replay then that is our only option.  Luckily, we found a workaround by staying patient and using Wireshark to look under the covers.  Incidentally, Wireshark is a fantastic tool to add to your performance testing arsenal if you haven’t already.

Making a recording of the traffic in Wireshark while the URL was accessed in Internet Explorer and comparing it to a recording made while the LoadRunner script was replayed in VuGen showed a problem during SSL handshake.  The difference was immediately apparent just watching the traffic in Wireshark (best to use a display filter if your network is as chatty as our customer’s).  IE was using TLSv1 protocol in order to make the request to the server while LoadRunner was trying SSLv1 and SSLv2 protocols, which although dated are still popular.  It is not clear if the server itself was configured to respond only to TLSv1.  But LoadRunner NEVER tried to use TLSv1 during handshake with the server.   A quick read of VuGen’s documentation clarified that LoadRunner uses only SSLv1 and SSLv2 by default.  By using web_set_sockets_option(“SSL_VERSION”,”TLS”) function and AND turning off the WinInet option allowed replay to work without a hitch.  As a result we now have a scalable scenario to test our web application.

Incidentally, you can also verify which version of SSL is being used for client-server communication by using the openssl binaries that ship with LoadRunner.  For example, to verify which version of SSL is used to connect to google.com, try the following:
  • Navigate to the folder containing openssl.exe (typically <LoadRunner installation folder>/bin on Windows).
  • Enter openssl s_client -connect google.com:443 
For your environment, replace google.com with your server name and 443 with the port number on which https traffic is handled.
You will see something like the following output (only the relevant lines are shown)
SSL handshake has read 1752 bytes and written 316 bytes
New, TLSv1/SSLv3, Cipher is RC4-SHA
Server public key is 1024 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
    Protocol  : TLSv1
    Cipher    : RC4-SHA
    Session-ID: F02931AD5F99B4FE65B52A8ACAEFB8378E6C4B0F89A0A71BC28A030236B3F8AA
    Master-Key: E1A9015993DD7D7EBDE13313AAA0DB768EA6644944FAE7F4AFE6B730061D4E0FB9F5A511616ACCBF073BCBDF90505FF2
    Key-Arg   : None
    Start Time: 1360550907
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
The highlighted text shows the protocol in use between the client and the server.

XForms – Lagging Implementations Point to Bigger Issue

XForms seem like such a good idea.  With declarative binding of form fields directly to XML document elements, you can implement an XML-centric XFormx/REST/XQuery (XRX) architecture and avoid a bunch of complex server-side code: no Javabeans needed to map HTML form fields to Java objects, and no object-relational mappings to persist the data.  Use some XSLT to create XForms from a schema, and you could eliminate most of the custom programming needed to collect, store, and display business information.
So why — nearly 10 years after the initial XForms recommendation — why aren’t XForms being used everywhere?  Our recent experience on a research project provides a clue.
First, XForm implementations are lagging.  There are arguably only a couple robust implementations on the market: the rest are little more than tinker toys that are extremely limited and fragile.  In addition, the implementations have enough extensions and different interpretations of the standard that code portability is non-existant, so you’re locked into a product.  This is a far cry from JSF, which has more than a dozen implementations that are more or less interchangeable.  The paucity of good implementations is an immediate problem for a development project, but it is really just a symptom of a deeper issue.
The real issue may be that the things that make XForms so great in theory make it hard to adopt in practice.  XForms provides relativly complete support for the model-view-controller (MVC) pattern, including a robust event model.  Cool, now you can do highly-interactive AJAXy things on the client.  But now the developer is working in a strange world of XForms event programming rather than Java or JavaScript.  Whoa…we’re going to discard all that experience and infrastructure?  And do all our control scripting in an XML dialect?  And what about a debugging environment?  The idea of client-side support for the MVC model sounds great, but when you get into you find it was the “MV” part that was exciting, and the “C” comes along like unwanted baggage.  But even the “M” part can be an issue.  Say you want to let users check of a bunch of items on a list for processing.  In a traditional form, it’s easy to provide a combined view of a perstent object with transient (checkbox) data.  But doing that in XForms either mucks up our business object with control data, or requires a second correlated object, which isn’t really supported by the standard XForm controls.  And what about paging a large list?  I could go on, but the point is that developers will have to find new ways and awkward workarounds for everyday features of all but the simplest forms.
The problem with this deeper issue is that it will be very slow to be resolved, if it is resolved at all.  If the only issue with XForms was in weak implementations, a new development project might be willing to bet on rapid improvements driven by high demand.  With deeper issues at play, I wouldn’t bet on it.

XML Schema Design: Form Follows Function

A disagreement on a recent project got me thinking about some of the more subtle aspects of schema design.  In this particular case, there were two different opinions on the best way to represent supertypes and subtypes.  Since the schema was based on an abstract UML model, supertypes and subtypes were everywhere.  I suspect that, with the trend towards model-driven development, others will encounter this same issue more and more.  So some perspective based on experience might be helpful.
The options for representing supertypes/subtypes essentially boiled down to these:
  •  Inheritance via xsd:extension.  Elements that are superclasses are defined in the regular way, then subclasses extend these, inheriting from the superclass.  The advantages of this approach include 1) it follows normal object-oriented principles, which have proven to be one of the most important advances in software development and fit naturally into the way people deal with complexity (“cars and trucks are both vehicles”), 2) it results in unambigous representations in Java as when JAXB is used, 3) it has an unambiguous SOAP representation, which is becoming more important as web services and SOA become more prevalent, 4) schema understanding is facilitated since subclasses explicitly declare their parent, 5) instance processing is facilitated since subclasses are forced to declare their type using the xsi:type attribute, 6) extensibility is facilitated since subclasses can be added without editing the superclass, and 7) it is the default for many UML-to-XSD generators such as the one provided in IBM’s Rational suite.
  • Composition via xsd:choice.  Elements that are subclasses are defined in the regular way, then superclasses are composed with a choice of elements for various subclasses.  The advantages of this approach are 1) it builds naturally upon the xsd:choice construct that is common in manually-constructed schemas so a lot of people are familiar with it, 2) it naturally gives you a wrapper element around subclass attributes that makes schema-unaware document processing with XSLT simpler (though wrappers can be added in the inheritance approach too), 3) it facilitates schema understanding since superclasses explicitly declare their children, and 4) subclass instances aren’t required to have an xsi:type declaration.  Some might claim that this approach avoids some schema understandability issues that arise if elements are declared with a superclass type and then the instance overridden with a subclass type (see Obasanjo, “Is Complex Type Derivation Unnecessary?”), but I prefer to follow the simple rule, “If it hurts, don’t do it.”
  • Aspects via xsd:group.  Superclass elements are defined as xsd:groups, and subclasses are created by including these groups.  For straightforward superclasse/subclass implementation, it’s not clear that their are any advantages to this approach: it’s somewhat akin to manually implementing inheritance rather than using the XML Schema features designed for that purpose.
In the end, the best approach probably depends on what you intend to do with the XML documents.  From the document processor’s perspective, composition via xsd:choice is the natural solution for marked-up text that will be processed as a document, while the software developer will tend to prefer inheritance via xsd:extension for XML that will be manipulated by software as data.   Using xsd:group may be a compromise that appeases both camps but doesn’t make either happy.  That last option is perhaps more interesting in another context given the rise of aspect oriented programming, but that’s a topic for another blog.

Is Oracle XML DB Ready for Prime Time?

Recent experience suggests not.  At least not if you want to mix XML and relational data.

Lately we’ve tried to generate XML records from about 10 relational tables using Oracle’s XQuery with the ora:view() function.  That attempt failed with an indefinite hang when processing more than a few hundred records (Oracle bug 8944761).  Using an explicit cursor in PL/SQL got us around that one, albeit slowly: 20 records/second is not my idea of fast when you have 400,000 records to process.  Next we tried to use XQuery and ora:view() to merge about 60 code table lookups into XML records (kinda nice to let users see both codes and descriptions, right?).  The performance of roughly three lookups per second was very disappointing.  
While the performance of pure relational/SQL and pure XML/XQuery operations appears good, mixing relational data into an XQuery with ora:view() appears to be a recipe for performance problems.

Achieving High Availability with Oracle

Applications that require high availability need some form of redundancy to protect against failures.  For those with an Oracle database, the first thing that comes to mind is often Oracle Real Application Clusters (RAC).  But is this the best solution?

Oracle RAC allows multiple active DMBS servers to access a single database.  If one server goes down, traffic is automatically routed to the remaining server within a matter of seconds.  Although Oracle has made deploying RAC easier in recent releases (10g and 11g), it still adds a significant amount of complexity and cost.  More importantly, it does not protect against storage system failures: if you lose your SAN, NAS, or JBOD storage, you lose your database.
Oracle DataGuard allows an active database to be replicated to another hot-standby or read-only copy.  If the active server or its storage goes down, the standby server is made active and traffic is routed to it within a few minutes.  DataGuard is included in the Oracle Enterprise license at no extra cost, and can replicate to either a local instance or to a remote instance (for full disaster recovery).  More importantly, it protects against both server and storage system failures.  It also can play a useful role in migrating to new server hardware or software.
If your business can tolerate a few minutes of outage, DataGuard is probably a good place to start.  If you need to further reduce the impact of a DBMS server outage from minutes to seconds, you should consider adding RAC.