Key to the robotics project, Microsoft said, was not to try to recreate humans in robotic form.
Microsoft Research has given a glimpse at some of its prototype data centre robotics efforts.
In a paper for Hotnets, the team said that “this marks the beginning of a fundamental shift in how we conceive and design data centre hardware and the software services,” but noted that the effort was “at the very start of this journey.”
Key to the robotics project, Microsoft said, was not to try to recreate humans in robotic form. “We simply do not believe that the humanoid form factor or a hand-inspired gripper is suitable for most tasks in the data centre,” the paper states.
Instead, the company believes that it makes sense to make multiple “advanced modular” robots that are designed for highly-specific tasks. Those tasks should ultimately allow the data centre to clean and repair itself.
“We propose the concept of self-maintaining systems,” the paper states. “A self-maintaining system is one that can manage and control its own hardware repair and maintenance. This is enabled through advanced robotics and automation. It offers the potential for fine-grained control of repairs, not only reducing the time window for a repair, but also helping manage the impact of cascading failures and false positives on repairs.
“An additional advantage is that currently very little data centre hardware is proactively serviced, it is usually accessed only when it fails. This is due to scale (and therefore costs) and the issue of cascading failures. We believe dextrous advanced robotics design specifically to operate in the data centre can also make proactive maintenance feasible, and thereby reduce the number of hardware failures.”
Microsoft Research detailed two such maintenance robots, both of which are still prototypes.
The first is a transceiver manipulation robot which includes a manipulator arm and gripper that can grip and manipulate a single transceiver “while minimising accidental interaction with physically close cables.”
The gripper can be inserted in between optical cables and then “gently to move them apart, while still being able to grip the transceiver pull tab.” The robot uses a vision system to understand the complex environment “and enable it to autonomously navigate through cluttered cabling to the target port to reseat, plug or unplug the transceiver.”
Humans, on the other hand, can cause transient packet loss when accidentally touching cables. “We refer to this phenomenon as simply cascading failures. Cascading failures occur when physical motion near or with hardware creates vibrations and other physical effects on the co-located hardware, which leads to additional transient (or permanent!) failures.
Next is a fibre and transceiver cleaning robot. When a transceiver with an attached fibre cable is plugged into the unit by a technician or the transceiver manipulation robot, this robot “automatically detaches the cable from the transceiver, visually inspects the fibre end-face cores and the transceiver and then cleans any parts needed to pass inspection, before reassembling.”
It features many actuators, “and the device is complex and dextrous,” Microsoft said. The transceiver and cable diversity found in a large-scale global cloud provider is a challenge, the company admitted, so the unit uses cameras and recognition models to determine the type and size of the transceiver and cable.
A display mounted on the robot allows a human to monitor and observe progress, as well as see the inspected images. The cleaning robot is modular and can be integrated with the transceiver manipulation robot, or be used as a standalone system.
This entire operation of both robots working in tandem currently takes a few minutes, but that could be optimised, Microsoft said. “Already, the end-face inspection for eight cores takes less than 30 seconds which is less time than a well-trained human.”
The company is now “focusing on developing a set of small-scale robotic units that minimise the variety of robot form factors needed while supporting a diverse range of operations, and this set of robotic units includes mobility units” to operate at larger-scale beyond the single rack.