How to maintain performance troubleshooting capabilities in heterogeneous and highly distributed environments?
One of our customers recently requested assistance to handle a slowdown pertaining to an application that was accessed through different paths according to the user’s location: with a thick Client, through Citrix XenApp or through a Web UI.
The users, spread over tens of sites in all 5 continents, were experiencing performance problems whatever the access mode they would use.
The company already owned a couple of PerformanceVision physical appliances. And the first option they considered was to position one of these appliances in the datacenter hosting the application which is located in Europe. This would have been simple, but the datacenter is outsourced, and deploying a physical appliance on location was unfortunately not an option!
So, the network team was being blamed, despite several obvious facts, such as:
- Only this specific application was experiencing a performance degradation
- It was a persisting problem
- Several user sites were being impacted
This situation was becoming a true nightmare for their IT Operations team, as it had been going on for several months by the time we started to work on it.
How to take troubleshooting to a global scale
How could they find the root cause of this degradation without installing any equipment in the datacenter, nor at any of their numerous remote sites?
When we jumped in, these are some of the difficulties our team had to overcome as we were heading towards the solution.
The client had 2 physical appliances available and their first questions were fairly technical, such as:
- “How can I analyze all of my sites with just two probes? I cannot ship these 1U appliances to each site, in case of degradation. It is too slow, too complicated. For some of the sites, because of Customs reasons, the delivery process can take up to one month”.
- “How can I ensure that the physical appliances I already have are used and get a good ROI”.
- How can I analyze both:
- the remote site’s LAN local traffic
- and the end user experience on all of the sites (from the datacenter’s traffic)?
The solution needed to match a complex set of requirements:
- Silos: the need of visibility across the board!
- As in every large organization, different teams will have well defined all scopes of responsibility. Although they can cooperate, their primary focus is on the core of their perimeter.
- Let’s take an example: for the analysis of WAN networks the organization relies on a WAN prioritization solution which provides an overview of the network performance, but that solution is complex to use, and does not provide any application performance visibility (up to TCP only). It is therefore, barely enough to get the network team out of trouble, but not enough so as to find the root cause of the degradation. To find out more about the differences between traditional NPM solutions and WireData Stream Analysis solutions, you might want to read our paper on 6 reasons to change your approach to network analysis.
- It was necessary to find a solution that could provide visibility beyond the limits of the silos!
- The troubleshooting skills available on the remote sites are limited.
- The client’s objective was to allow local teams to take a look at problems, therefore, the solution had to be simple enough for them to make it theirs.
- The degradations are random and intermittent
- Performance issues are intermittent and impossible to reproduce: they needed to capture performance data 24x7 and be able to go back in time as required.
- Outsourced datacenter -- No physical probe!
- Need to get visibility in all 3 ways to access the application: HTTP/HTTPS, Citrix XenApp, Thick Client
- The analysis needs to provide visibility on both the remote LAN, the WAN traffic and the application tiers in the datacenter.
Complex problems, require a simple solution!
The bottom line for this complex set of challenges, is that we were able to efficiently respond through a simple solution deployed in just a few days…
And this is the configuration we used:
- Since it was impossible to deploy any physical device in the datacenter, we simply deployed 3 virtual capture devices (standard virtual appliances) in charge of capturing key application flows on the different clusters within the datacenter itself.
- 10 remote sites reporting the performance degradations were equipped with additional virtual capture devices which were installed on existing VMware hosts.
- All of the analytics were centralized within an existing physical PerformanceVision appliance in order to provide a single pane of glass.
As all of the analytics are computed in real time by each local capture device, the centralization of the data remains bandwidth savvy.
- The deployment on the 10 remote sites required only 2 days of work, and the entire integration was performed remotely requiring no travel, nor shipping.
- As for the set-up of the capture within the datacenter itself (3 probes) required less than a day.
In the end, it took the IT team less than 3 days to be able to monitor the performance of 10 sites spread over 5 continents and to monitor the different tiers in the application chain.
They were therefore able to get to the root cause in a very short period of time. Showing in the end that the degradation was coming from the tier database. They were then able to diagnose the application issue and quickly fix it:
- Users were inputting their data entries, but at given times the database servers were not available and the application was not providing any feedback or error to the user.
- Production sites in India, the United States, and Brazil saved many days of rework and reduced their loss of production.
- The local IT teams have access and actively use PerformanceVision for their troubleshooting and monitoring.
- Based on this data, they got a fast resolution from their application vendor by precisely pointing out the flaws that needed to be fixed.
The power of agentless analysis provides a 360° visibility on application delivery
Even if you do not have a direct control on your entire IT environment, agentless performance management solutions can leverage physical and virtual network traffic, thus providing you with in-depth and extensive insight on the existence and origin of slowdowns at all levels:
- Application tiers;
- Thin Client Architecture;
- Front Servers (Web or other);
- Back-end (database and files);
- On any remote site;
- In any datacenter.