Workstations for point cloud processing (Leica Cyclone)

4331 0

 

Armed with a bucketful of CPUs, GPUs, memory and storage, Greg Corke goes in search of the ultimate workstation for registering / importing point clouds into Leica Cyclone Register 360

As far as AEC workflows go, point cloud processing is one of the most computationally intensive out there. Registration — the process of taking individual laser scans and combining them in one unified coordinate system — can take many hours. And any reduction in this time can pay huge dividends, especially in the fast-paced world of construction verification.

The Leica Cyclone product line is one of the leading applications for point cloud processing and there’s a special version of the software — Leica Cyclone Register 360 — that is designed to take much of the pain out of registration. As one might expect, certain parts of the software are multi-threaded, so it can take advantage of multiple CPU cores. However, unlike ray trace rendering which can get the absolute most out of every single core, registration with Cyclone Register 360 is more nuanced.

In this article we set out to find the best workstation for point cloud processing and also how individual components influence performance. Most firms have tight hardware budgets and it’s easy to spend money in the wrong places for very little gain. In some cases, investing in more expensive equipment can actually slow you down.

Now, it’s important to state here that while I know a thing or two about workstations, I am no expert in point cloud processing and, before this article, I had never used Leica Cyclone Register 360. Leica Geosystems provided full support and feedback throughout this process, but the testing process and conclusions made in this article were entirely editorially driven.

User control

Before we get into testing, it’s important to understand how Cyclone Register 360 utilises workstation hardware. Compared to other AEC-focused applications, such as ray trace rendering or simulation, which dynamically allocate workstation resources, Cyclone has to predict in advance what resources it will need for any given dataset. This is largely down to the fact that Cyclone Register 360 supports so many third-party point cloud formats — a total of fifteen. As such, the software has to rely on several different readers and third-party libraries to support those formats — each developed by the other manufacturers (Autodesk for RCP, Faro for FLS and FPJ, Zoller + Frohlich for ZFS etc.).

Resource allocation centres on memory (RAM) usage and how the demand for RAM increases with the number of threads used in parallel. For any new job, the system looks at the format type, size and resolution of the dataset, then intelligently estimates the amount of RAM that will be used per thread and allocates the number of threads accordingly.

Advertisement
Advertisement

If too many threads were assigned to a registration during import, the workstation could quickly run out of memory, which could mean the software crashes hours into a job. As you might expect, Leica Geosystems has designed this process such that stability takes precedence, which means the software can be quite conservative in the way it allocates resources.

Cyclone Register 360 provides some control over this process, offering three settings for ‘import performance’. ‘Safe Mode’ forces the job to run on a single thread to help ensure the import doesn’t fail, even with terabyte datasets on workstations with relatively low specifications. ‘Balanced’ (the default setting) uses between two and four threads and ‘Fast’ will allow the software to access more threads but the user must assess the benefits of this speed increase against the possibility of overtaxing the system and the registration process failing.

Test setup

For testing, Leica Geosystems supplied a dataset of the Italian Renaissance-style ‘Breakers’ mansion in Newport, Rhode Island, USA. The project was originally used as a proof of concept to verify the Visual Inertial System (VIS) and automatic registration capabilities of the Leica RTC360 laser scanner, which are designed to automatically place project data on real-world coordinates so a registration can be completed with little to no manual intervention. The dataset is 99GB in size and includes a total of 39 setups from an RTC360 scanner, around 500 million points and 5k panos.

For our testing we focused on import / registration and the time it takes to complete the process. We used a variety of different workstations, but most of our testing centred on the Scan 3XS GWPCAD Q116C, which features an 8-core Intel Core-i9 9900K CPU.

All machines were installed with Windows 10 Professional and the latest graphics drivers, as recommended by Leica Geosystems. HyperThreading (on Intel) and Simultaneous Multithreading (SMT) (on AMD) were set to on, although this doesn‘t really benefit the end user. We used Cyclone Register 360 Version 1.6.2. Import performance was set to ’Fast’.

Memory

As mentioned previously, the amount of system memory (RAM) dictates the number of threads that are allocated on import. With the Breakers dataset we found a workstation with 16GB used one thread, one with 32GB used two threads, one with 64GB used five and one with 128GB used six.

Using Windows Performance Monitor we tracked memory usage and found it changed significantly during import, forming peaks and troughs as each new setup started and finished. At no point did the system ever come close to running out of memory, leaving plenty of room for error, or capacity for the user to multitask and run other software on the same workstation.

With 128GB, however, memory appeared to be underutilised, with an observed peak of 50GB, leaving 68GB free. We asked Leica Geosystems why the software didn’t use more than six threads when there was obviously memory available to do so. The response was that the software had been optimised for the most common configurations owned by customers, and in use by its development team, but with the increased prevalence of workstations that support 128GB, it is now working to improve the utilisation of memory on these higher end machines. To this end, it will test various datasets from various scanners and profile them accordingly. This should lead to future versions of the software being able to use more threads and more memory on 128GB workstations.

Based on our observations, for best price / performance we’d probably recommend a workstation with 64GB. 128GB will give you more scope for multitasking and help future proof your workstation when Cyclone Register 360 is able to better exploit the additional memory. However, it’s important to note that not all workstations (especially mobile workstations) can support 128GB. 32GB is still usable, but 16GB looks to be too little, having a significant negative impact on import time. If your machine only has 16GB we’d strongly recommend you upgrade straight away. See chart 1 below for more details.

CPU

It will come as no surprise that as more CPU cores are used, import time goes down. But before you rush out to buy a 64-core AMD Threadripper workstation it’s important to note there are diminishing returns the higher you go. What’s more, having those cores run at a very high frequency is equally important and GHz tends to go down as core count increases.

With this in mind, Cyclone Register 360 is very well suited to overclocked workstations where the frequency of every CPU core is permanently boosted. With CPUs that run at standard clock speeds, only one or two cores go into Turbo.

Even though Cyclone Register 360 can use multiple cores, they don’t all run at 100% as they do for a highly-threaded process like ray trace rendering. When importing a point cloud in Cyclone Register 360, utilisation on each core goes up and down throughout the import process with noticeable peaks and troughs.

Even though Cyclone Register 360 can use multiple cores, they don’t all run at 100%. Utilisation goes up and down throughout the import process

Out of all the CPUs we tested for this article, the ten core Intel Core i9-10900K (Q2 2020) performed best. In the Scan 3XS GWP-ME Q120C workstation all cores were overclocked to 5.0GHz. This was closely followed by the eight core Intel Core i9-9900K, which was at a slight disadvantage as it ran at its stock 3.6GHz up to 5.0GHz Turbo in the Scan 3XS GWP-CAD Q116C workstation.

Both CPUs are significantly faster than the quad core CPUs typically found in a CAD or BIM workstation, two years old or more. The importance of the additional cores is illustrated in two tests — 1) when disabling some cores in the Intel Core i9-9900K (see chart 3 below) and 2) when comparing both CPUs to the older Intel Xeon W-2125 (Q3 2017) and Intel Xeon E3-1245 v5 (Q4 2015) CPUs (see chart 2 below).

For testing we didn’t have access to an Intel machine with more than ten cores, such as the Intel Core X-series with 10, 12, 14 or 18 cores. However, even when overclocked, we wouldn’t expect these CPUs to offer any real performance benefit and they may actually be slower. They also cost significantly more.

But choosing a CPU for point cloud registration is not just about getting the best perfomance in Cyclone Register 360. There is a potential benefit to having more than 8 or 10 cores in that the CPU will enable better multitasking. Not many people want to sit at their desks twiddling thumbs while waiting for the registration to finish. So investing in a machine with 12, 16 or 18 cores and using processor affinity in Windows to pin Cyclone Register 360 to 8 or 10 of those cores, would leave the remaining CPU cores free to work with other applications.

Workstations used to be Intel all the way, but in the last couple of years AMD has started to offer some serious competition. This is especially true with 3rd Gen AMD Ryzen (available with 8, 12 or 16 cores) and 3rd Gen AMD Ryzen Threadripper (available with 24, 32 or 64 cores).

With the 32-core 3rd Gen Threadripper 3970X we saw a small benefit when using all 32 cores, compared to 16 or 8 (with all other cores disabled). This is also testament to the excellent cooling in the Armari Magnetar X64T-G3 FWL which uses AMD Precision Boost Overdrive to full effect, allowing all cores to run at very high frequencies (see here for a full review).

The 64-core Threadripper 3990X is certainly overkill and while we’d expect the 8 core AMD Ryzen 7 3800X or 12 core AMD Ryzen 9 3900X to be better fits for Cyclone Register 360 we think the equivalent Intel CPUs will still have a slight edge.

As an aside, it’s worth pointing out our experiences with AMD’s 2nd Gen Ryzen Threadripper CPUs. When testing the 32-core AMD Ryzen Threadripper 2990X we found performance went up considerably when some cores were disabled (see chart 4 below). This is partly down to the CPU being able to maintain higher clock speeds when fewer cores are active, but probably more to do with the memory architecture.

In 2nd Gen Threadripper, as not all cores have direct access to memory, they sometimes have to ask other cores for data and then wait for it to arrive, which can really slow things down. Deactivating cores that don’t have direct access to memory appears to have a huge positive impact on performance. As you might imagine, there is no such architectural limitation in 3rd Gen AMD Ryzen Threadripper or 3rd Gen AMD Ryzen.

Storage

Despite working with huge datasets, storage isn’t as critical as it can be with other compute intensive software. Because Cyclone Register 360 never lets the workstation run out of memory by limiting the number of threads, it never needs to page data into swap space (a dedicated portion of the disk that is commonly used to extend the amount of available memory). That said, having fast storage is still very important and a Solid State Drive (SSD) is essential. We’d recommend PCIe NVMe SSDs where possible, but our tests show that SATA SSDs should do an equally good job, which might come as a surprise to some.

While PCIe NVMe SSDs boast significantly higher sequential read / write speeds than SATA SSDs, this only really benefits workflows that use large continuous datasets. This is not the case for RAW point cloud data from the RTC360, nor the project data created by Cyclone Register 360.

Our 99GB dataset for example, comprises 7,400 files (0.2MB to 226MB in size) and the registered dataset 320 files (1MB to 700MB in size), with many other smaller files created in the temp folder along the way. And when it comes to reading and writing these types of files, which often happens concurrently throughout the import process, the SATA SSD appears to do an equally good job. More importantly perhaps, considering the amount of processing that is needed for registration, one can presume that storage is not really a bottleneck.

With this in mind we don’t think there would be any notable performance benefit to having two SSDs, even if they were configured in a RAID 0 array. While we didn’t test this in Scan’s i9-9900K workstation, the 64-core Threadripper Armari Magnetar X64T-G3 FWL workstation did actually feature a RAID 0 array (a pair of superfast 1TB Corsair MP600 PCIe 4.0 NVMe SSDs delivering up to 9GB/sec of sequential read / write performance) but we saw nothing to suggest that it contributed to a faster import time.

With HDDs things are different and the reason they should not be considered on their own is because they are mechanical and therefore very poor at reading and writing data at the same time. But that doesn’t mean you should reject them altogether. HDDs offer a much better price per GB, so they are very good for storing large project datasets.

In addition, our tests show that it is possible to use a combination of SSD and HDD without negatively affecting performance. By putting all the RAW point cloud data on an HDD, and the main storage and archive folder on an NVMe SSD it had virtually no impact on import time (see chart 5 below).

From a performance perspective, you can quite happily use a single SSD for OS, applications and point cloud processing. But if you process huge amounts of point cloud data on a daily basis or you want to run other disk intensive applications at the same time, you may want to consider a dedicated SSD, and one with high endurance as well.

SSDs are typically rated by terabytes written (TBW) — the amount of data that can be written over its lifetime or warranty period. Consumer-focused SSDs tend to have lower endurance ratings and, on paper, will fail before professional SSDs. With this in mind, we’d always recommend Samsung Pro over Samsung EVO.

Finally, it is worth pointing out that the official recommendation from Leica Geosystems is to use separate drives for read and write.

GPUs

Cyclone Register 360 is an interesting application insofar as it can also use a workstation’s GPU to accelerate import. The role it plays is not huge, but you still need to choose carefully. Leica Geosystems has a preference for Nvidia GPUs with 4GB or more of memory. It does not recommend integrated Intel graphics as these have been known to be less stable on import.

We tested a variety of pro GPUs, and while import was faster with the topend Nvidia Quadro RTX 4000 (8GB) it didn’t make that much difference to the overall import time, especially compared to the Quadro M2000 (4GB), which is a few years older and costs significantly less (see chart 6 below).

The most notable observation was just how big a negative impact the AMD FirePro W2100 had on import time. This is a very entry-level GPU that is several years old and only has 2GB memory, so we didn’t expect great things. We did wonder if the lack of memory might be the limiting factor here, especially considering Leica’s recommendation of 4GB. However, we never observed memory usage going above 1GB.

Of course, the GPU is not just for registration; its primary role is actually 3D graphics. We didn’t test any of these GPUs in this way — that’s the basis for a whole other article — but performance in 3D applications should arguably be the driving force in any purchasing decision.

The other important consideration is the ability for the GPU to multitask. If you do intend to use your workstation for other things, perhaps working with registered point clouds in a different application, then the GPU has to be able to handle graphics tasks and calculations in Cyclone Register 360 at the same time.

The Quadro RTX 4000 and AMD Radeon Pro W5500 are very adept at doing this, whereas the Quadro P2200, M2000 and Intel integrated are much less so. With these less capable GPUs the end user may experience an unresponsive UI or significantly reduced levels of 3D performance.

Conclusion

Hopefully this article will have given you some food for thought when choosing a workstation for Leica Cyclone Register 360 — or potentially upgrading one you already have. Simply buying some more memory, for example, could have a massive impact on import time, and it’s very easy to install.

You don’t have to spend a huge amount on a new machine, but it’s important to apply your budget in the right places. Scan’s 3XS GWP-ME Q120C with an overlocked ten core Intel Core i9-10900K CPU looks to be a good starting point for mainstream users of Cyclone Register 360. Upgrading to 128GB RAM would give you an additional benefit and, depending on your datasets, it could probably do with a larger SSD, but this won’t increase price dramatically. You could also save money by downgrading to a Quadro P2200 or AMD Radeon Pro W5500 GPU.

If you intend to use your workstation for multitasking you may benefit from a CPU with more cores, but make sure you have enough RAM and keep an eye on clock speeds.

Finally, it’s important to note that all of our observations / recommendations are based on our experiences with this one representative RTC360 dataset. With other datasets, particularly those acquired from other laser scanners, the results might be different depending on size, the availability of imagery and other variables.


Article updated on 11/6/20 – to include the new 10-core Intel Core i9-10900K CPU.

leica-geosystems.com

Scan 3XS GWP-CAD Q116C

■ Intel Core i9 9900K CPU (3.6GHz, 5.0GHz Turbo) (8C, 16T)
■ 128GB Corsair Vengeance DDR4 2,666MHz memory
■ Nvidia Quadro P2200 GPU
■ 1TB Samsung 970 Evo Plus M.2 NMVe SSD + 2TB HDD
■ Asus Z390-A motherboard
scan.co.uk/3xs
■ £2,125 (Ex VAT)

Read AEC Magazine’s review here.

Scan 3XS GWP-ME Q120C

■ Intel Core i9 10900K CPU (5.0GHz overclock) (10C, 20T)
■ 64GB Corsair Vengeance DDR4 3,000MHz memory
■ Nvidia Quadro RTX 4000 GPU
■ 1TB Samsung 970 Evo Plus M.2 NMVe SSD + 2TB HDD
■ Asus ProArt Z490 Creator 10G motherboard
scan.co.uk/3xs
■ £2,500 (Ex VAT)

Full review coming soon.


If you enjoyed this article, subscribe to our email newsletter or print / PDF magazine for FREE

Advertisement

Leave a comment