I'm now +70 days behind in my project to deploy our mission critical application to my users around the globe in RDS 2012 via RemoteApp. Actually check that; I abandoned RDS 2012 and installed RDS 2012 R2 the moment it hit my VLSC so I'm now +70
days behind, two operating systems deep, one ticket opened with MS and so many published, unpublished, rapid-release, or other type of hotfixes that I'm now seeing hotfix numbers everywhere, like the numbers in LOST, into this problem with no resolution in
sight.
Problem Description: During a RemoteApp session served from a datacenter in Colorado, users in the UK & New Zealand, Australia and other distant areas, report that our in-house .net4 application (call it SalesApp!) "flickers"
rapidly during the remote app session.
Behavior origin: This behavior was not observed when deploying the same application via Remote App in RDS 2008 R2 from the same datacenter on the same hardware to the same remote offices from May 2011 to the present day. This
flickering behavior is peculiar to the new stack
Server Description: 6x Dell R810 physical hosts running Windows Server Datacenter 2012, patched up as of mid October, with Hyper-V 2012 on the hosts, a non-virtualized but VMM-managed Fabric consisting of 8x1GbE LACP team + 1x iSCSI,
1xiSCSI, 1xAuxiliary NICs on each server. 4x Socket with Xeon 7500 (I think?) Nehalem CPUs with six cores each, for a total of 24 non-hyperthreaded cores.
VM: 75 virtual machines in cluster. Workloads 100% MS. SalesApp! written in Visual Studio 2010 in C#, SQL 2008 R2 enterprise for back-end SQL crunching, with Crystal Reports 32bit and 64bit plugins for reporting. SalesApp! published out
to a Click Once architecture. (Interesting factoid: Click ONce applications were not supported by Microsoft for Remote App publishing in 2008 R2 or 2012, but are now with 2012 R2) 2008 R2 RDS farm contains 4 Session Hosts, a Connection Broker & License
server on two, clustered VMs.
WAN: Global MPLS network on leased lines. Big sites like UK office, HQ in California, and Datacenter in Colorado have 100megE circuit. 37ms latency between HQ & Datacenter. 160ms latency between UK and Datacenter. 180ms latency between
NZ & Datacenter.
PCs: REmote office PCs that have observed this behavior are Dell Optiplex GX 720 or above, with video card updates, driver updates, latest Remote Desktop Appliation, and all that stuff. Same Optiplex in UK or NZ or Australia does not exhibit
RemoteApp flicker behavior in 2008 R2 Remote App, only in 2012/2012R2.
Oddities: UK IT Manager can't replicate flickering behavior on Windows 8 (not 8.1) PC
Have you found any articles mentioning this: Why yes, I have! But to save you all from the link farm I've summarized my thoughts after months of troubleshooting in the spider chart below showing likely cause based on my testing:
![]()
As I've tested each idea, each test result has changed the spider chart a bit.
For instance, yesterday, I wanted to test the latency theory of screen flicker in RemoteApp on both Windows 8 and Windows 7.
From HQ, I pointed my workstation & win 7 laptop at a WANem VM and added 160ms to my connection and ran some typical actions in SalesApp! to reproduce flicker.
While Windows 8 was in general faster , in some areas Windows 7 was. As each session was being conducted on the same VM -the single 2012 R2 RDS server I'm trying to get into production, CPU differences between my Win8 desktop & the laptop shouldn't make
much of a difference.
So, disappointingly, I could not reproduce screen flicker issue even at +200ms, discounting massively the likelihood that this is related to latency.
![]()
However, at +500ms latency with 10% packet loss tossed in for good measure, I could reproduce behavior resembling the Flicker behavior, but I'd say it was 90% application performance problems over a crappy connection and 10% screen flicker.
It did not resemble, in total, the videos I've seen from my remote users.
As for what i call the "Metro Hypothesis," that refers to a theory I found from other engineers discussing this problem on a different forum. Metro Hypothesis says: "
A
Windows developer that states: "The issue lies with W8 and W2012 graphical API management system. Microsoft, in order to be able to support tactile tablet user interface has rewrite its graphical handler. The new Metro display method is
not backward compatible with Windows standards." |
Remote FX: In 2008 R2 Hyper-V, RemoteFX was a niche product, to be used sparingly for high end demands like CAD or such. In 2012 and 2012 R2, RemoteFX is not just a GPU virtualization technology, but a bandwidth manipulator, able to throttle
up or down based upon Group Policies, able to leverage UDP or TCP or both in an on-demand fashion, and offers you discrete instrumentation to monitor its cost, impact and weight on your system.
Unfortunately for me, we never bought RemoteFX capable graphics cards. So all RemoteFX technologies are being done by Host/VM CPU. I have gone from extreme "disable all RemoteFX" to leveraging all performance settings of RemoteFX as best I could
to going a more middle-of-the-road route. All RemoteFX policy and setting changes have resulted in no change in flicker behavior. I have watched perfmon counters on most RemoteFX instruments and never once have the instruments shown a lack of resources (there
are three counters alone for that).
I have not run client-side traces because it is enormously difficult to get a user on the other side of the world form you to take a break from their job -making money for the business- so you can run procmon or what not.
I even bumped up new RDS Session Host server to have Highest priority on hosts with 16 vCPU, 24GB RAM and still, no difference.
What's Left now? I'm out of time, the business is frustrated with me, and our process is such that business users can veto and halt IT projects even if the screen flicker issue doesn't affect all or even a majority of users.
I can't buy Citrx or Quest or put it in my budget, staying on 2008 R2 is a possibility but is not ideal at all, and I have no idea what to do or test next. Dev team knows of no API changes that could produce this behavior, and a sample application they wrote
for me to test modal vs non-modal windowing produced the same flicker behavior in UK & NZ.
R810 stack doesn't support PCIe latest, so investing in Remote FX/DX 11 cards isn't a good spend I feel I've exhausted everything and I'm throwing out a lifeline to anyone who will listen. Thanks.
I have three videos of this behavior and can email to anyone but don't feel comfortable showing the video online.
Robert