Windows is now built on Git, but Microsoft has found some bottlenecks

300 GB repo handles 8,421 pulls and 1,760 official builds a day, more once GVFS fix is in

Microsoft has adopted Git to manage the vast collection of code that is Windows' source, and has shared performance issues it's had to fix along the way.

The state-of-the-nation report for what Microsoft calls the “largest Git repo on the planet” follows on from its launch of the “fat Git repo” handler, the Git Virtual File System, as the foundation of its planned shift.

Redmond's certainly feeling pleased with itself about the move, in particular stroking itself about being able to move the whole 2,000-strong Windows OneCore team from the Source Depot internal tool to Git over a weekend.

Redmondista Brian Harry blogs that the 300 GB Windows repository now catches 8,421 pull requests and 1,760 official builds a day.

Even so, he notes, more than 28 percent of the 251 staff that responded to an internal survey aren't happy, and understanding why is as important to keeping the move smooth as waving around scalability numbers.

Reasons reported by Harris include tools that don't support Git, having to learn the new process and performance falling short of demand.

Performance, he writes, is mostly down to Microsoft's Git Virtual File System, a layer designed to present Git as an “ordinary” file system to the user.

Microsoft pulled telemetry for around 3,500 engineers using GVFS, below (“P80” shows the results for the 80th percentile of users, over the last seven days; in red is the change between beginning and end of the week).

Microsoft's Git performance measurements

While GVFS is faster than “vanilla Git”, he writes, the system is getting slower over time (as repos and user numbers grow).

“[O]ver time, engineers crawl across the code base and touch more and more stuff … you end up with a bunch of files that were touched at some point but aren’t really used any longer and certainly never modified. This leads to a gradual degradation in performance.”

Compared to Git, he notes, GVFS changes “many operations from being proportional to the number of files in the repo to instead be proportional to the number of files 'read'.”

That's what's slowing things down, so Redmond is refining GVFS with something it calls “O(modified)” – instead of the number of files read, key commands are proportional to the number of files a user has current, uncommitted edits on.

He's measured O(modified) against four commands, and claims a speed up of 2.3 times for Status, 3.5 times for Add, 6.2 times for Commit, and 29 times for Checkout.

The other “must try harder” turned out to be in remote offices. Because the current iteration of GVFS is over-centralised, an operation like Clone, that only needs 127 seconds in Redmond, is taking nearly 25 minutes in Microsoft's North Carolina offices.

To get around that, Harry writes, GVFS's maintainers added a proxy capability: “With a proxy configured and up to date, it took 70 seconds (faster than Redmond because the Redmond team doesn’t use a proxy and they have to go hundreds of miles over the internet to the Azure data center)”.

Third parties seem pleased with GVFS as well, with support from Atlassian in SourceTree, Tower, as well as Visual Studio and Git for Windows. ®


Biting the hand that feeds IT © 1998–2017