STYX: Stream Processing with Trustworthy Cloud Stream Processing with Trustworthy Cloud-based...

download STYX: Stream Processing with Trustworthy Cloud   Stream Processing with Trustworthy Cloud-based Execution Julian James Stephena Savvas Savvidesa Vinaitheerthan Sundarama;b Masoud

of 13

  • date post

    17-Mar-2018
  • Category

    Documents

  • view

    212
  • download

    0

Embed Size (px)

Transcript of STYX: Stream Processing with Trustworthy Cloud Stream Processing with Trustworthy Cloud-based...

  • STYX: Stream Processing with Trustworthy Cloud-based Execution

    Julian James Stephen Savvas Savvides Vinaitheerthan Sundaram,

    Masoud Saeida Ardekani Patrick Eugster, ,

    Purdue University, SensorHound Inc, Samsung Research America, TU Darmstadt

    Abstract

    With the advent of the Internet of Things (IoT), manybillions of devices are expected to continuously collectand compute on sensitive data (e.g., location, personalhealth). Due to limited computation capacity availableon IoT devices, the current de facto model for buildingIoT applications is to send the gathered data to the cloudfor computation. Since building private cloud infrastruc-tures to handle such large amount of data streams is veryexpensive, low cost public (untrusted) cloud infrastruc-tures such as Amazon EC2 are used. However, usingpublic clouds for processing continuous queries includ-ing on sensitive data is leading to increasing concernsover data confidentiality, and is a major factor prevent-ing more widespread adoption of IoT solutions.

    This paper presents STYX, a novel programming ab-straction and managed runtime system, that ensures con-fidentiality of IoT applications whilst leveraging the pub-lic cloud for continuous query processing. The key ideais to intelligently utilize partially homomorphic encryp-tion to perform as many computationally intensive oper-ations as possible in the untrusted cloud. STYX providesa simple abstraction to the IoT developer to hide the com-plexities of (1) applying complex cryptographic prim-itives, (2) reasoning about performance of such primi-tives, (3) deciding which computations can be executedin an untrusted tier, and (4) optimizing cloud resource us-age. An empirical evaluation with benchmarks and casestudies shows the feasibility of our approach.

    1 Introduction

    The ubiquity of computing devices is driving a mas-sive increase in the amount of data generated by hu-

    Supported by NSF under grant Practical Assured Big Data Anal-ysis in the Cloud and Northrop Grumman under grant Secure Cloud-based Multi-party Computing, and by Amazon through AWS credits.

    Work done when employed by Purdue University

    mans and machines. With the advent of the Internetof Things (IoT), many more billions of devices are ex-pected to continuously collect sensitive data (e.g., loca-tion data, personal health data) and compute on it. Dueto limited storage and computation capacity available onIoT devices, the current de facto model for building IoTapplications is to send the data gathered from physicaldevices to the cloud for both computation and storage(e.g., SmartThings [5], Nest [3]). Many IoT applicationsthusly leverage the cloud to compute on data streamsfrom a large number of devices. For example, to computevariable tolls or to identify highway accidents or trafficconditions, a smart city application may collect vehiclelicense plate numbers, speed, and location informationat the cloud.

    Due to the sheer amount of the streaming data, build-ing prviate cloud infrastructure is very expensive com-pared to low cost public (untrusted) cloud infrastructuresuch as Amazon EC2 or Microsoft Azure. Therefore,public clouds are typically used for processing continu-ous queries including on sensitive data. However, thistrend is leading to increasing concerns over data con-fidentiality, and is becoming one of the major factorspreventing more widespread adoptions of IoT solutions.For instance, a recent study, among 2,062 American con-sumers, shows that the top concern among consumers isWho is seeing my data [23].

    One way to mitigate these concerns is to encrypt dataat the source (i.e., IoT device), and to solely use cloud in-frastructures for storage purposes (e.g., Bolt [24]). Thus,as long as encryption keys are maintained securely byconsumers, their data remains secured. While this ap-proach addresses the aforementioned confidentiality con-cerns, all computations need to be performed in trustedenvironments. This solution hugely impacts the compu-tational capabilities available for IoT solutions.

    A promising approach to tackle these issues is to usepartially homomorphic encryption (PHE) techniques,and execute certain operations over encrypted data. Yet,

    1

  • Figure 1: STYX overview

    existing solutions use a storage system to this end. Forinstance, CryptDB [29] was implemented on top ofMySQL, while MONOMI [36] and Talos [31] were im-plemented on top of Postgres. These database-centricsolutions are not a good fit to build many IoT applica-tions because IoT applications, like many other sensornetwork applications, are typically implemented as con-tinuous queries in a stream processing system.

    Applying PHE to stream processing applications how-ever leads to a number of challenges which prevents pro-grammers from doing so explicitly. In short: (G1) a num-ber of PHE schemes exist, varying by operations sup-ported, efficiency, etc. Efficiency incurred by individ-ual crypto systems is of particular concern with resource-constrained IoT devices. Application developers do notnecessarily have the domain knowledge for judiciouschoices. In addition, secret keys need to be managedappropriately. (G2) Processing continuous queries typ-ically involves a pipeline of computing tasks and eachtask may have one or more instances running concur-rently. The deployment profile, which maps task in-stances to VMs in the cloud, should make balanced useof resources to avoid bottlenecks. Yet, encryption shiftsbottlenecks thus invalidating known optimization heuris-tics. Finally, (G3) as hinted to by their name, PHEschemes do not support arbitrary operations. Unsup-ported operations have to be performed on the trustedclient side in plaintext form. Subsequently, either thequery processing can continue on the trusted client sideor the result of the operation can be re-encrypted in thescheme required by the subsequent operations and con-tinued in the cloud. Deployment profiles must be cog-nizant of such re-encryptions.

    This paper presents STYX, a novel programming ab-straction and managed runtime system, that leveragesPHE to provide confidentiality for IoT applications dele-gating online streaming jobs to the public cloud. STYXoperates on streaming data without revealing any plain-text information to the untrusted cloud. Figure 1 givesa high level overview of STYX. The user designs, im-plements, and initiates the stream analysis programs thatrun in the untrusted cloud. IoT sensors are programmedto encrypt generated data before emitting them in thestream for analysis. The user may also include additional

    streams of encrypted private data that are required for aspecific analysis.

    To perform analysis in the untrusted cloud over en-crypted data (whilst addressing G1-G3), STYX (C1) uti-lizes PHE techniques and (C2) provides efficient imple-mentation of these techniques so they can run on IoTdevices. If a specific sequence of computations cannotbe performed because of limitations of PHE schemes,STYX is capable of (C3) executing the remainder of thecomputation in the trusted tier or (C4) re-encrypting adata stream (or specific fields in a stream) to enable fur-ther computation in the public cloud. STYX also pro-vides an analytical modeling module which (C5) deducesthe best deployment profile for the application. Finally,(C6) programmers can develop applications using theSTYX API without having to know about the details ofthe underlying crypto system used to implement the op-erations that preserve confidentiality.

    More specifically, in this paper we make the followingcontributions.

    We introduce a secure stream abstraction that exposesa high level API through which programmers can ex-press programs that can be executed in the publiccloud in a way preserving confidentiality without hav-ing to know the details of underlying crypto systems.

    Describe how STYX analyzes programs written usingthe STYX API and identifies the computations thatcan be executed purely on encrypted data and thosethat computations that cannot, due to the limitationsof PHE. STYX tries to maxime perform the amountof computation performed in the cloud by splittingcomputation between the untrusted cloud and a smallnumber of trusted nodes while automatically perform-ing required re-encryptions. Fast serialization tech-niques and encryption pre-computation are two keytechniques used to assure the efficiency of STYX.

    Propose a heuristic that analyzes the resource avail-abilities and requirements and generates a deploymentprofile that optimizes cloud usage.

    Evaluate the implementation of STYX on multiplebenchmarks and case studies. Our results indicate thatSTYX can be used to express many real-world IoT ap-plications, including variable toll application for smartcities, by ensuring confidentiality without any knowl-edge of crypto system, and keeping low overhead.

    The remainder of this paper is organized as follows.Section 2 presents an overview of our solution. Sec-tions 3 and 4 present details of the of design of STYXand its managed runtime system. Section 5 presents theimplementation of STYX. Section 6 presents empiricalevaluation. Section 7 contrasts with related work. Sec-tion 8 concludes with final remarks.

    2

  • Figure 2: STYX graph and tasks

    2 STYX Overview

    STYX provides strong confidentiality guarantees againsta powerful adversary with full access to servers in thecloud. We assume that the adversary can have root ac-cess to cloud servers, or even have access to the RAM ofphysical machines. We note that STYXs goal is to pre-serve confidentiality, but not integrity or availability andhence the adversarial model we consider does not allowfor arbitrary changes in the program, analysis results ordata stored within the cloud. We also assume that STYXhas access to a set of limited but trusted resources that iswith the u