Is it possible to process highly sensitive data (say, a person’s genome or their tax returns) on an untrusted, remote system without ever exposing that data to that system? Some of the biggest names in tech are now working together to figure out a way to do just that.
Last month, the Linux Foundation announced the Community Computing Consortium (CCC)—a cross-industry effort to develop and standardize the safeguarding of data even when it’s in use by a potentially untrusted system. Alibaba, Arm, Baidu, Google Cloud, IBM, Intel, Microsoft, Red Hat, Swisscom, and Tencent are all CCC founding members.
Encrypting data at rest and in transit, the Foundation said in a press statement, are familiar challenges for cloud computing providers and users today. But the new challenge the Consortium is taking up concerns allowing sensitive, encrypted data to be processed by a system that otherwise would have no access to that data.
They may want to take a look an an open-source project first launched in 2016 called Ryoan.
“In our model, not only is the code untrusted, the platform is also untrusted,” says Emmett Witchel, professor of computer science at the University of Texas at Austin.
Witchel and co-authors developed Ryoan as a piece of code that would use security features in Intel CPUs to effectively sandbox encrypted data—and allow computation on that data without requiring that either the software or the hardware (other than the CPU) be secure or trusted.
Witchel says a key inspiration for his group was a 1973 paper by Butler Lampson of Xerox’s Palo Alto Research Center. “He talked about how difficult it is to confine untrusted code,” Witchel says. “That means if you have code that wasn’t written by you, you don’t know the motivations of the people who wrote it, and you want to make sure that code isn’t stealing any secrets [from your data], it’s very, very difficult.”
Witchel says one of his graduate students at the time argued that confining data completely within a sandbox—in the face of potentially adversarial code and hardware—was practically impossible.
Witchel adds that he agreed with his student, up to a point. “But it’s not 1973. And people are doing different things with computers now,” he says. “They’re recognizing images. They’re processing genome data. These are very specific tasks that have properties that we can take advantage of.”
According to Witchel, Ryoan uses what’s called a “one-shot data model.” That means the program looks at a user’s sensitive data only once in passing—a scheme that might be applicable to rapid-fire video or image streams that run image recognition software.
It was this transitory, one-time nature of the data processing that simplified Lampson’s Confinement Problem and made it solvable, Witchel says.
On the other hand, you couldn’t use Ryoan if you were processing a sensitive image on a remote, untrusted system that was storing and processing—and then storing and further processing—one’s data. Such a process may be necessary if, say, you were running Photoshop remotely on a sensitive image.
Ryoan relies on an Intel hardware security feature called Software Guard Extensions (SGX), which allowed its creators to begin addressing the problem. However, SGX is only a first step, says Tyler Hunt, a University of Texas, Austin computer science graduate student and co-developer of Ryoan.
“SGX only allows you to have 128 megabytes of physical memory, so there’s some challenge in finding applications that are small enough,” Hunt says. “And genomes are very large, so if you actually wanted to process an entire genome, you’d need some larger hardware to do it.”
Witchel and Hunt say there are other efforts now afoot to make “enclaves” like Ryoan (in other words—isolated, trusted computing environments) in larger and more diverse systems than just a CPU.
Microsoft researchers last year proposed a software environment called Graviton, which would run trusted computing enclaves on GPUs. RISC-V architecture hardware systems may soon be adopting a trusted computing enclave standard called Keystone. And ARM—a CCC member—offers its own trusted computing environment called TrustZone.
Witchel says trusted enclaves like Ryoan and its descendants could be important as users become more savvy about the permissions they give for use of their personal data.
People today could be better educated, Witchel says, so they can understand and assert their rights. They need to know “that their data is out there and is being used and monetized. And that they should have some control and some assurance that their data is only being used for the purposes that they released it for. I want my genome data to tell me about my ancestry. I don’t want you to keep that genome data to use it to develop a drug that you then sell to me at an inflated price. That seems unfair.”