STYX: Stream Processing with Trustworthy Cloud · PDF fileSTYX: Stream Processing with...

13
STYX: S tream Processing with T rustworthy Cloud-based Ex ecution * Julian James Stephen α Savvas Savvides α Vinaitheerthan Sundaram α ,β Masoud Saeida Ardekani γ Patrick Eugster α ,β ,δ α Purdue University, β SensorHound Inc, γ Samsung Research America, δ TU Darmstadt Abstract With the advent of the Internet of Things (IoT), many billions of devices are expected to continuously collect and compute on sensitive data (e.g., location, personal health). Due to limited computation capacity available on IoT devices, the current de facto model for building IoT applications is to send the gathered data to the cloud for computation. Since building private cloud infrastruc- tures to handle such large amount of data streams is very expensive, low cost public (untrusted) cloud infrastruc- tures such as Amazon EC2 are used. However, using public clouds for processing continuous queries includ- ing on sensitive data is leading to increasing concerns over data confidentiality, and is a major factor prevent- ing more widespread adoption of IoT solutions. This paper presents STYX, a novel programming ab- straction and managed runtime system, that ensures con- fidentiality of IoT applications whilst leveraging the pub- lic cloud for continuous query processing. The key idea is to intelligently utilize partially homomorphic encryp- tion to perform as many computationally intensive oper- ations as possible in the untrusted cloud. STYX provides a simple abstraction to the IoT developer to hide the com- plexities of (1) applying complex cryptographic prim- itives, (2) reasoning about performance of such primi- tives, (3) deciding which computations can be executed in an untrusted tier, and (4) optimizing cloud resource us- age. An empirical evaluation with benchmarks and case studies shows the feasibility of our approach. 1 Introduction The ubiquity of computing devices is driving a mas- sive increase in the amount of data generated by hu- * Supported by NSF under grant “Practical Assured Big Data Anal- ysis in the Cloud” and Northrop Grumman under grant “Secure Cloud- based Multi-party Computing”, and by Amazon through AWS credits. Work done when employed by Purdue University mans and machines. With the advent of the Internet of Things (IoT), many more billions of devices are ex- pected to continuously collect sensitive data (e.g., loca- tion data, personal health data) and compute on it. Due to limited storage and computation capacity available on IoT devices, the current de facto model for building IoT applications is to send the data gathered from physical devices to the cloud for both computation and storage (e.g., SmartThings [5], Nest [3]). Many IoT applications thusly leverage the cloud to compute on data streams from a large number of devices. For example, to compute variable tolls or to identify highway accidents or traffic conditions, a smart city application may collect vehicle license plate numbers, speed, and location information at the cloud. Due to the sheer amount of the streaming data, build- ing prviate cloud infrastructure is very expensive com- pared to low cost public (untrusted) cloud infrastructure such as Amazon EC2 or Microsoft Azure. Therefore, public clouds are typically used for processing continu- ous queries including on sensitive data. However, this trend is leading to increasing concerns over data con- fidentiality, and is becoming one of the major factors preventing more widespread adoptions of IoT solutions. For instance, a recent study, among 2,062 American con- sumers, shows that the top concern among consumers is ”Who is seeing my data” [23]. One way to mitigate these concerns is to encrypt data at the source (i.e., IoT device), and to solely use cloud in- frastructures for storage purposes (e.g., Bolt [24]). Thus, as long as encryption keys are maintained securely by consumers, their data remains secured. While this ap- proach addresses the aforementioned confidentiality con- cerns, all computations need to be performed in trusted environments. This solution hugely impacts the compu- tational capabilities available for IoT solutions. A promising approach to tackle these issues is to use partially homomorphic encryption (PHE) techniques, and execute certain operations over encrypted data. Yet, 1

Transcript of STYX: Stream Processing with Trustworthy Cloud · PDF fileSTYX: Stream Processing with...

STYX: Stream Processing with Trustworthy Cloud-based Execution∗

Julian James Stephenα Savvas Savvidesα Vinaitheerthan Sundaramα,β

Masoud Saeida Ardekaniγ † Patrick Eugsterα,β ,δ

αPurdue University, β SensorHound Inc, γSamsung Research America, δ TU Darmstadt

Abstract

With the advent of the Internet of Things (IoT), manybillions of devices are expected to continuously collectand compute on sensitive data (e.g., location, personalhealth). Due to limited computation capacity availableon IoT devices, the current de facto model for buildingIoT applications is to send the gathered data to the cloudfor computation. Since building private cloud infrastruc-tures to handle such large amount of data streams is veryexpensive, low cost public (untrusted) cloud infrastruc-tures such as Amazon EC2 are used. However, usingpublic clouds for processing continuous queries includ-ing on sensitive data is leading to increasing concernsover data confidentiality, and is a major factor prevent-ing more widespread adoption of IoT solutions.

This paper presents STYX, a novel programming ab-straction and managed runtime system, that ensures con-fidentiality of IoT applications whilst leveraging the pub-lic cloud for continuous query processing. The key ideais to intelligently utilize partially homomorphic encryp-tion to perform as many computationally intensive oper-ations as possible in the untrusted cloud. STYX providesa simple abstraction to the IoT developer to hide the com-plexities of (1) applying complex cryptographic prim-itives, (2) reasoning about performance of such primi-tives, (3) deciding which computations can be executedin an untrusted tier, and (4) optimizing cloud resource us-age. An empirical evaluation with benchmarks and casestudies shows the feasibility of our approach.

1 Introduction

The ubiquity of computing devices is driving a mas-sive increase in the amount of data generated by hu-

∗Supported by NSF under grant “Practical Assured Big Data Anal-ysis in the Cloud” and Northrop Grumman under grant “Secure Cloud-based Multi-party Computing”, and by Amazon through AWS credits.

†Work done when employed by Purdue University

mans and machines. With the advent of the Internetof Things (IoT), many more billions of devices are ex-pected to continuously collect sensitive data (e.g., loca-tion data, personal health data) and compute on it. Dueto limited storage and computation capacity available onIoT devices, the current de facto model for building IoTapplications is to send the data gathered from physicaldevices to the cloud for both computation and storage(e.g., SmartThings [5], Nest [3]). Many IoT applicationsthusly leverage the cloud to compute on data streamsfrom a large number of devices. For example, to computevariable tolls or to identify highway accidents or trafficconditions, a smart city application may collect vehiclelicense plate numbers, speed, and location informationat the cloud.

Due to the sheer amount of the streaming data, build-ing prviate cloud infrastructure is very expensive com-pared to low cost public (untrusted) cloud infrastructuresuch as Amazon EC2 or Microsoft Azure. Therefore,public clouds are typically used for processing continu-ous queries including on sensitive data. However, thistrend is leading to increasing concerns over data con-fidentiality, and is becoming one of the major factorspreventing more widespread adoptions of IoT solutions.For instance, a recent study, among 2,062 American con-sumers, shows that the top concern among consumers is”Who is seeing my data” [23].

One way to mitigate these concerns is to encrypt dataat the source (i.e., IoT device), and to solely use cloud in-frastructures for storage purposes (e.g., Bolt [24]). Thus,as long as encryption keys are maintained securely byconsumers, their data remains secured. While this ap-proach addresses the aforementioned confidentiality con-cerns, all computations need to be performed in trustedenvironments. This solution hugely impacts the compu-tational capabilities available for IoT solutions.

A promising approach to tackle these issues is to usepartially homomorphic encryption (PHE) techniques,and execute certain operations over encrypted data. Yet,

1

8QWUXVWHG�FORXG

6WUHDP�SURFHVVLQJ�DSSOLFDWLRQV

(QFU\SWHG�GDWD

7UXVWHG�7LHU

,R7�HQFU\SWLQJ�KDUGZDUH

(QFU\SWHG�GDWD

Figure 1: STYX overview

existing solutions use a storage system to this end. Forinstance, CryptDB [29] was implemented on top ofMySQL, while MONOMI [36] and Talos [31] were im-plemented on top of Postgres. These database-centricsolutions are not a good fit to build many IoT applica-tions because IoT applications, like many other sensornetwork applications, are typically implemented as con-tinuous queries in a stream processing system.

Applying PHE to stream processing applications how-ever leads to a number of challenges which prevents pro-grammers from doing so explicitly. In short: (G1) a num-ber of PHE schemes exist, varying by operations sup-ported, efficiency, etc. Efficiency incurred by individ-ual crypto systems is of particular concern with resource-constrained IoT devices. Application developers do notnecessarily have the domain knowledge for judiciouschoices. In addition, secret keys need to be managedappropriately. (G2) Processing continuous queries typ-ically involves a pipeline of computing tasks and eachtask may have one or more instances running concur-rently. The deployment profile, which maps task in-stances to VMs in the cloud, should make balanced useof resources to avoid bottlenecks. Yet, encryption shiftsbottlenecks thus invalidating known optimization heuris-tics. Finally, (G3) as hinted to by their name, PHEschemes do not support arbitrary operations. Unsup-ported operations have to be performed on the trustedclient side in plaintext form. Subsequently, either thequery processing can continue on the trusted client sideor the result of the operation can be re-encrypted in thescheme required by the subsequent operation’s and con-tinued in the cloud. Deployment profiles must be cog-nizant of such re-encryptions.

This paper presents STYX, a novel programming ab-straction and managed runtime system, that leveragesPHE to provide confidentiality for IoT applications dele-gating online streaming jobs to the public cloud. STYXoperates on streaming data without revealing any plain-text information to the untrusted cloud. Figure 1 givesa high level overview of STYX. The user designs, im-plements, and initiates the stream analysis programs thatrun in the untrusted cloud. IoT sensors are programmedto encrypt generated data before emitting them in thestream for analysis. The user may also include additional

streams of encrypted private data that are required for aspecific analysis.

To perform analysis in the untrusted cloud over en-crypted data (whilst addressing G1-G3), STYX (C1) uti-lizes PHE techniques and (C2) provides efficient imple-mentation of these techniques so they can run on IoTdevices. If a specific sequence of computations cannotbe performed because of limitations of PHE schemes,STYX is capable of (C3) executing the remainder of thecomputation in the trusted tier or (C4) re-encrypting adata stream (or specific fields in a stream) to enable fur-ther computation in the public cloud. STYX also pro-vides an analytical modeling module which (C5) deducesthe best deployment profile for the application. Finally,(C6) programmers can develop applications using theSTYX API without having to know about the details ofthe underlying crypto system used to implement the op-erations that preserve confidentiality.

More specifically, in this paper we make the followingcontributions.

• We introduce a secure stream abstraction that exposesa high level API through which programmers can ex-press programs that can be executed in the publiccloud in a way preserving confidentiality without hav-ing to know the details of underlying crypto systems.

• Describe how STYX analyzes programs written usingthe STYX API and identifies the computations thatcan be executed purely on encrypted data and thosethat computations that cannot, due to the limitationsof PHE. STYX tries to maxime perform the amountof computation performed in the cloud by splittingcomputation between the untrusted cloud and a smallnumber of trusted nodes while automatically perform-ing required re-encryptions. Fast serialization tech-niques and encryption pre-computation are two keytechniques used to assure the efficiency of STYX.

• Propose a heuristic that analyzes the resource avail-abilities and requirements and generates a deploymentprofile that optimizes cloud usage.

• Evaluate the implementation of STYX on multiplebenchmarks and case studies. Our results indicate thatSTYX can be used to express many real-world IoT ap-plications, including variable toll application for smartcities, by ensuring confidentiality without any knowl-edge of crypto system, and keeping low overhead.

The remainder of this paper is organized as follows.Section 2 presents an overview of our solution. Sec-tions 3 and 4 present details of the of design of STYXand its managed runtime system. Section 5 presents theimplementation of STYX. Section 6 presents empiricalevaluation. Section 7 contrasts with related work. Sec-tion 8 concludes with final remarks.

2

VRXUFHY�

Y� Y�

Y�

Y�

Y�

Y�

Y�

Y�

Y�

Y�

Y�

Y�

6HF�JUDSK�YHUWH[

1RGH��

7DVN

&RPSXWH�1RGH�LQ�SXEOLF�FORXG

&RPSXWH�1RGH�LQ�WUXVWHG�GRPDLQ

V� V�

V� V�

1RGH�� 1RGH��

Figure 2: STYX graph and tasks

2 STYX Overview

STYX provides strong confidentiality guarantees againsta powerful adversary with full access to servers in thecloud. We assume that the adversary can have root ac-cess to cloud servers, or even have access to the RAM ofphysical machines. We note that STYX’s goal is to pre-serve confidentiality, but not integrity or availability andhence the adversarial model we consider does not allowfor arbitrary changes in the program, analysis results ordata stored within the cloud. We also assume that STYXhas access to a set of limited but trusted resources that iswith the users. As we shall see in this paper, this envi-ronment is leveraged to perform certain computations.

2.1 Data Flow

STYX provides the same core abstractions as manystream processing systems – streams, tuples, and fields.Briefly, a single logical data unit is a field, one or morefields makes up a tuple and an unbounded stream of tu-ples is simply denoted as a stream. A stream in STYXis defined by the programmer declaring a schema whichnames the fields of the tuples in the stream. The tu-ples in the stream are processed in a distributed fash-ion. Application logic is arranged as a directed graphwhere vertexes of the graph are computation componentsand edges are streams that represent the data flow be-tween components. Application programmers write ap-plication logic for the vertexes of the graph. A subsetof these vertexes are also designated as source vertexes.These source vertexes act as entry points for data into thegraph. Source vertexes typically read data from a queue,log file, or external subscriptions. As data is generated inreal time and added to the queue, it is picked up by thesource vertexes and gets forwarded down the graph forprocessing. In order to allow secure processing, STYXallows programmers to specify if a field represents sen-sitive data by marking it as a sec-field (for secure field)when the schema for the field is defined. If at least onefield in a tuple is a sec-field, then STYX marks the streamof such tuples as a sec-stream (for secure-stream). Fig-

Program (STYX API, Annotations)

HomomorphismAnalysis

Topologyscheduling

Topologyexecution

Trusted Tier Untrusted Cloud

AnalyticalModel

Figure 3: STYX Execution Flow

ure 2 shows a graph with four vertexes with v1 desig-nated as the source vertex. The figure also shows sec-streams s1,s2,s3 and s4. Each vertex of the graph mayhave multiple runtime instantiations called tasks. In Fig-ure 2, vertex v2 has two tasks running in Node1 and v3has three tasks running in Node2. We refer to this as-signment (i.e., a specific number of tasks to each vertex)the deployment profile of the graph. As can be triviallyobserved, a good deployment profile is critical for scal-ing and good performance. A grouping clause defineshow tuples in a stream are partitioned among the tasks ofa vertex that receives the stream. This grouping clause isalso part of the definition of the graph and is provided bythe programmer. Common groupings used in STYX areas follows: (a) Shuffle grouping: tuples are distributedrandomly across tasks in such a way that each task getsan equal number of tuples. (b) Field grouping: tuples arepartitioned according to the field specified in the group-ing and distributed among tasks. (c) All grouping: thestream is replicated across all the tasks.

2.2 Execution Flow

We now give an overview of the STYX execution flow.Figure 3 outlines the steps followed by STYX from thetime a STYX graph is designed. We outline these stepshere and describe detailed working of each step in sub-sequent sections. STYX leverages the idea that often theuser has some amount of computing power, though lim-ited, which can be trusted. We refer to these resourcesas the trusted tier. The compute resources in the cloud,though potentially unlimited for practice purposes, areuntrusted. STYX utilizes the trusted tier for applica-tion development and compilation and uses the cloudfrom the deployment phase. If at all runtime comput-ing resources are required from the trusted tier, STYXtries to minimize their usage. Application programmersuse the STYX API to design a graph which containsthe application logic. Details of this is outlined in Sec-tion 3.1. STYX then performs homomorphism analysison the graph to identify the crypto systems required toexecute the graph. The details of this analysis is com-municated to the IoT devices which generate key pairscorresponding to the crypto systems. More details of thisanalysis is given in Section 3.2. Next, STYX analyticallyidentifies the number of tasks required for different ver-texes which is detailed in Section 4.1. Finally the graphis scheduled for execution as explained in Section 4.3.

3

Table 1: STYX crypto system

Crypto system Property OperationRandom AES Random encryption (RND) -Deterministic AES Deterministic encryption (DET) x = y ⇐⇒ Enc(x) = Enc(y)Boldyreva et al. [14] Order-preserving encryption (OPE) x > y ⇐⇒ Enc(x)> Enc(y)Song et al. [33] Search x contains y ⇐⇒ Enc(x) contains Enc(y)Paillier [28] Additive homomorphic encryption (AHE) x+ y = Dec(Enc(x)×Enc(y))ElGamal [20] Multiplicative homomorphic encryption (MHE) x× y = Dec(Enc(x)×Enc(y))

1 /** Code executed when a vertex receives a tuple */2 @EncOperations(operations = { "{eq}", "{sum}" })3 public class StyxSumVertex extends StyxVertex {4 SlotBasedSum<SecField> slidingWindowGroupSums = new SlotBasedSum<SecField>(60);5 public void execute(Tuple tuple) {6 //if timing tuple ...7 emitCurrentWindowCounts(slidingWindowGroupSums);8 //else ...9 SecField group = FieldOper.getField(tuple, 0);

10 SecField sum = FieldOper.getField(tuple, 1);11 slidingWindowGroupSums.updateSum(group, getCurTimeSec(), sum);12 //end if..else13 }14 }15 /** Class that keeps track of sum of values per group in each time slot */16 public class SlotBasedSum<T> {17 ...18 public void updateSum(T group, int slot, SecField val) {19 SecField[] sums = objGroupSum.get(group);20 if (sums == null) { sums = new SecField[this.numSlots]; init(sums, val);21 objGroupSum.put(obj, sums); }22 sums[slot] = SecureOper.add(sums[slot], val);23 }24 }

Listing 1: STYX code for finding sum of each group in a sliding window

3 STYX Secure Streams

In this section we describe the programming abstractionthat STYX exposes to express secure steams and the ho-momorphism analysis STYX performs so that IoT de-vices can encrypt their output streams appropriately.

3.1 Expressing Secure StreamsApplication programmers use the abstractions providedby our STYX Java API to describe a graph. We de-scribe the abstractions that are new to STYX over typicalstream processing systems. These are also summarizedin Table 2.

SecField. Programmers use this class to realize thesec-field abstraction that refers to a confidential in-put field. For example, programmers can get a refer-ence to the first value in a tuple as shown in Listing 1

Line 9. sec-fields can also be initialized by reading anencrypted value directly from an input stream.

SecOper. Secure operations (sec-opers) are opera-tions provided by STYX that allow programmersto implement standard operations like multiply,add, compare and equal. Programmers use theSecOper class to perform these operations. sec-opers take sec-fields as input and return a sec-field or Boolean value as result. For exam-ple, in order to perform an addition, programmersmay write code similar to SecField result =

SecOper.add(operand1, operand2) in order toadd two sec-fields, operand1 and operand2.

StyxVertex. This base class is extended by program-mers to express the computation in a vertex of thegraph. This class provides the execute() methodwhich is invoked by STYX when a tuple arrives at ver-tex for execution.

4

Table 2: STYX Abstractions.

STYX API FunctionSecField Confidential fieldSecOper.add(f1, f2)1 f1× f2 mod n2

SecOper.multiply(f1, f2)2 f1.a ×f2.af1.b × f2.b

SecOper.compare(f1, f2) 3 f1 ≤ f21 n represents the public key used to encrypt f1 and f2.2 MHE scheme cipher text contains two components de-noted here by a and b. 3 OPE scheme require compar-isons over integers 128 bits and higher.

Listing 1 shows code snippets used in a STYX vertexclass. The code is from a graph that keeps track of thesum of values in different groups within a sliding win-dow (last one minute in this example). The input tu-ple contains two fields: group name and value for thatgroup (Lines 9 and 10). The code shown retrieves thegroup and value fields from the input tuple (Lines 9, 10)and updates the sum for that group’s current time slot(Line 11) with the value. Every time the vertex receivesa timing tuple, that signifies a minute has elapsed, itemits the sum of all groups in the current sliding win-dow (Line 7). The object maintaining the sliding win-dow internally contains a map and updates the group’ssum every time the updateSum() method is called us-ing STYX’s SecureOper.add() method (Line 22).

Listing 2 shows just the function updateSum() fromListing 1 written without using STYX abstractions. Ascan be seen, this requires the programmer to explicitlyread the public key (see Line 2), and perform the exactAHE evaluation operation for addition – multiplicationfollowed by modulus using the square of the public keyif using Paillier as in Line 7.

In STYX’s Java implementation typically each vertexis designed as a separate Java class. The stream emittedby each vertex is declared explicitly in the vertex itself.Once all vertexes of the graph are designed, the graphitself is put together by defining the input stream of eachvertex and specifying the grouping clauses.

3.2 Processing Secure Streams

In order process secure streams, STYX first performs ahomomorphism analysis on the user program. The pri-mary responsibility of this homomorphism analysis is toenable seamless transition from the program logic writ-ten for plaintext to the program logic required for exe-cuting on encrypted streams. This consists in the follow-ing functions. Once the analyzer determines the cryptosystems required for each stream, it may turn out that

some operations cannot be performed over the availablehomomorphic crypto system. To get around this issue,STYX may decide to either perform the operation in thetrusted tier or re-encrypt the stream in the trusted tier.For re-encryption, STYX inserts special vertexes into thegraph and marks it such that the tasks executing thatvertex get scheduled on the trusted tier. After insertingre-encryption vertexes, the analyzer annotates the graphspecifying for each vertex if it must be executed in thetrusted tier or untrusted cloud.

Identifying encryption schemes. This step identifiesthe schemes that are required for the various fields basedon the operations that the application wishes to performon those fields. To apply these inferences, STYX firsthas to identify the different streams and their group-ing clauses in the application logic. This can be de-rived from the graph declaration as explained in Sec-tion 3.1. Secondly, we need to identify the sequenceof operations performed on each stream. STYX derivesthis from program annotations in each vertex class in thegraph. Line 2 in Listing 1 shows a vertex class whichreceives a stream with two fields and declares that thefirst field is used for equality comparison and the secondfield is used for performing addition. Once STYX de-rives the distinct streams and operations to be performedon those streams, we can build on prior work [34]to in-fer the crypto systems required to execute the graph. Webriefly summarize how these techniques work. Consid-ering each stream as an operand and using the operationsderived from the annotations, an expression tree is con-structed similar to a parse tree where the leaf nodes arethe fields in the stream and non-leaf nodes are opera-tors. Using a lookup table which identifies the cryptosystem required for each operation, leaf nodes are as-signed crypto systems that match the parent operator.The lookup table also identifies the crypto system of theresult (which will always be same as the crypto systemof the operands). Non-leaf operator nodes are assignedthis resultant crypto system. There can be a mismatchbetween parent and child operator nodes if they do notbelong to the same crypto system. In this case (unlikein [34]) a re-encryption operator is inserted in between,which converts the stream from one crypto system tothe other. These re-encryption nodes are also marked tobe executed on the trusted tier so that the scheduler canplace the tasks correctly.

Encrypting streams. Though the IoT devices handlemost of the encryptions, sometimes the programmer mayhave plaintext data from public streams like stock quoteswhich are part of application logic. Though these dataare public, we may need to encrypt them for two rea-sons. 1. If these public data needs to be combined or

5

1 public class SlotBasedSum<T> {2 BigInteger publicKey = readPubKey();3 public void updateSum(T group, int slot, BigInteger val) {4 String[] sums = objGroupSum.get(group);5 if (sums == null) { sums = new BigInteger[this.numSlots]; init(sums, val);6 objGroupSum.put(group, sums); }7 sums[slot] = sums[slot].multiply(val).mod(publicKey.multiply(publicKey));8 }9 }

Listing 2: Code for finding sum of each group in a sliding window without STYX abstractions

compared with private data already encrypted for com-puting in the cloud, the public data should also be underthe same key for such operations to succeed. 2. Despitedata being public, if it is seen in plaintext by an attacker,she can deduce more meaning about the analysis beingperformed. For easily integrating public data into theprogram, STYX provides special vertex classes that en-crypt plaintext streams.

Initialization and constants. One of the challengesfor enabling programming using encrypted fields is tohandle initializations. For example, a standard op-eration like sum = sum + val usually requires sum

to be initialized to 0 before being executed. Whenthe same operation is performed over encrypted data,sum must be initialized to the encrypted value of0, and further, encrypted using the same crypto sys-tem and key as val. STYX handles this by allow-ing the runtime to request constants from the trustedtier. For example, STYX code like SecField sum =

SecureOper.getConstant(0, val) is used to ini-tialize a variable to 0 (used within the init(sums)

method in Listing 1, Line 20). STYX uses its internalmeta data to identify that the field val is derived from thesecond field in the input stream and requests the trustedtier to return 0 encrypted using the same crypto systemand key as the variable val. Note that the number of po-tential encryptions of a constant is bound by the numberof unique fields across all streams in a graph. This allowsthe trusted tier to pre-compute and cache commonly usedvalues and reduce runtime latency.

Key management. One of the challenges of using ho-momorphic encryption for IoT based streams is that thestreams need to be encrypted using the same key for longdurations. If a continuous query aggregates data over asliding window, it is not possible to change the encryp-tion key of the stream without disrupting the output. Ifwe decide to switch over to a new key at time t, aggrega-tions in the sliding window that spans both sides of timet will fail. We work around this issue by emitting tuplesencrypted under the old and new key for the time span of

the sliding window in the continuous query.

4 STYX Deployment

In this section we describe what happens when a graphis deployed into the STYX runtime.

4.1 Deployment Profile GenerationObtaining good resource utilization and scaling well forstreaming systems like STYX requires specifying thenumber of runtime instantiations (tasks) for each vertex.To reason about this, we use a utilization metric. Uti-lization of a vertex for a time interval is defined as theamount of time the vertex spends processing in that timeinterval. For example, if a vertex spends 5 minutes in a10 minute interval actually processing tuples and the restof the time waiting for tuples to arrive, then it has utiliza-tion of 0.5. As utilization of a vertex approaches 1, wecan assume the vertex is starting to become a bottleneck.Good scaling is usually achieved by the programmer ex-plicitly specifying the number of tasks for each computa-tion vertex. The programmer is perfectly suited to do thisjob as she understands, based on application logic, whichvertexes handle more data or computation, and can cor-respondingly allocate more tasks for it. In STYX, whenthe computation graph is transformed and operations areconverted to their cryptographic equivalents (e.g., an ad-dition operation becomes product mod square of publickey with Paillier) the utilization of a vertex changes sub-stantially. This means the programmer needs to have adeep understanding of overheads of each crypto system,which is contrary to STYX’s design goals.

4.2 HeuristicIn order to relieve the programmer from having to knowthe details of the transformed graph, we propose a linearprogramming based heuristic which automatically con-verts the instance values the programmer specifies for theplaintext graph into instance values for a STYX graph.Figure 4 shows the formal representation of the heuristics

6

we use. S represents the slots available for instances touse and V represents the vertexes in the graph that needsto be allocated. A slot is typically a Java virtual machine(JVM) or an executor thread within a JVM. We assumeall slots have the same processing capacity. The matrixA represents how much each vertex amplifies its input.In our implementation, each element of the matrix is afunction f (x) that returns the number of tuples a vertexoutputs if it receives x tuples. For vertexes that emits tu-ples on fixed time intervals (e.g., the top ten values everyminute), f (x) returns a constant. A is derived by exe-cuting the plaintext graph on sample data. To computethe amount of data arriving at a vertex, we consider allpaths of varying lengths that end up at that vertex fromthe source. The A matrix gives the amount of data atvertexes which are one edge away from the source (forunit input). To find data arriving at the vertex i (repre-sented by di) through paths of length 2 and higher, wecompute the power matrix of A represented in Figure 4as A2, A3 etc. The vector C represents the load on eachvertex relative to one another. C is derived by first in-verting the programmer specified values for the numberof instances for each vertex (from the deployment pro-file) in the plaintext version of the graph and then scal-ing it with respect to the crypto operations performed bythe vertex. For example, in Figure 2, assume the pro-grammer specified the number of instances for each ver-tex as v1 : 1,v2 : 3,v3 : 2,v4 : 6 for the plaintext graph,then we start with the vector {1,1/3,1/2,1/6} or sim-ply {6,2,3,1}. The intuition here is that for vertexes thatcome under heavy load, the programmer will allocate ahigher number of instances in the deployment profile toaccommodate the load. As the next step we scale downthe value of each element in the above vector based ofa reduction factor. This reduction factor is derived em-pirically based on our observations. We use a reductionfactor of 3 for AHE schemes, 2 for DET and 6 for re-encryption nodes. For example, based on Figure 2, if v3receives a stream (s2) with a field encrypted with AHE,we scale down the value corresponding to v3 to 1, afterwhich we arrive at C← {1,2,1,1}. After scaling downeach element we get the C vector which can be used inthe linear program. Further, in Figure 4, T representsthe deployment profile so that ti represents the numberof slots allocated to execute vertex vi. Our target nowis to derive each ti. For this we define two sets of con-straints. The first set of constraint ensures each vertex isallocate to at least one slot. The second set of constraintensures that for all vertexes, the load (described by c) isless than the capacity of the nodes to process it. Underthese constraints, we try to maximize the amount of datathat can be consumed at the source vertex.

GivenS : [s1,s2 . . .sn] Available slotsV : [v1,v2 . . .vm] VertexesA : {[a11 . . .a1m] , . . . , [am1 . . .amm]} where,

ai j : Output from vi that goes to v j for unit inputC : [c1,c2 . . .cm] whereci : Relative load on any s j processing vi

Unknownsdi : Input consumed by single instance of i

Di : Input consumed by all instances of i

T : {t1, t2 . . . tm} whereti : Number of slots for vi

From definitionsD1← d1× t1∀1 < i≤ m,

Di← d1× t1×{A1i +A21i + . . .Al

1i} where,l is the length of longest path between v1 and vi

Linear program constraints∀0 < i < m, ti ≥ 1 All vis are allocated∀0 < i < m,Di < ci× ti

Linear program objective functionmax

t(D1)

Figure 4: STYX heuristics

4.3 STYX Scheduler

The primary responsibility of the STYX scheduler is todecide on which host machines vertexes of the graph willbe executed. The STYX scheduler is provided with twolists of hostnames, one that lists hosts in the untrustedcloud, and another that lists hosts in the trusted client en-vironment. The scheduler reads the graph annotation toidentify where each vertex must be executed. For com-ponents that need execution in a trusted environment, thescheduler sends the appropriate class files to the workerinstances running on the trusted side. The trusted sideworkers have access to private keys required for encryp-tion or cryptosystem transformation. The workers in theuntrusted cloud only see the encrypted data and have ac-cess only to the public keys required to perform the ho-momorphic operations. The scheduler itself can run inthe untrusted cloud. An attacker can try to manipulate thescheduler in two ways (i) by trying to executed trustedvertexes in the untrusted cloud and (ii) by trying to exe-cute untrusted code in the trusted tier. Item (i) does notcompromise confidentiality because the untrusted clouddoes not possess the private keys required to reveal the

7

plain text data. This may result in availability issues,but confidentiality is retained. Item (ii) can compromiseconfidentiality if the attacker is successful in executingmalicious code that retrieves private keys or read datawhen they are in plaintext while being re-encrypted. Toavoid (ii), a hash of the vertexes to be executed in thetrusted tier is generated before deployment. When tasksare delivered to the trusted tier for execution, the trustedtier first computes a hash of the task class and comparesit with the hash generated before deployment. Executionproceeds only if there is a match.

5 Implementation

In this section we lay some of the implementation de-tails of STYX. STYX’s processes in the cloud are im-plemented by modifying Apache Storm [1]. Storm isan online, distributed computation system. The core ab-straction that Storm provides is called stream. Applica-tion logic is packaged into directed graphs called topolo-gies. Vertexes of the topologies are computation com-ponents and edges represent data flows between compo-nents. There are two types of components in storm: (i)spouts that act as event generators, and (ii) bolts that cap-ture the program logic. In other words, spouts producethe data stream which the bolts operate upon. Modifica-tions to Storm are limited to implementing a new sched-uler (Section 4.3) (by overriding the IScheduler inter-face) and changes to the way a Storm topology is sub-mitted (StormSubmitter and related classes). Thesechanges add an additional 1031 lines of code to Storm.The programming interfaces and runtime cryptographicclasses that allow computations over encrypted data arepackaged as a separate jar library, implemented in 3633lines of Java code. The cryptographic classes make usethe GNU multiple precision arithmetic library GMP [6]to perform fast arbitrary precision arithmetic operationsinvoked using JNI.

Encryption schemes. We implement randomized en-cryption (RAN) using AES [17] and use CBC mode witha random initialization vector. We construct determinis-tic encryption (DET) using an AES pseudo-random per-mutation block cipher following the approach used inCryptDB [29] and use a variant of CMC mode [25] witha zero initialization vector. We use Boldyreva et al. [14]to implement order-preserving encryption (OPE), Pail-lier [28] crypto system to implement additive homomor-phic encryption (AHE), and the ElGamal [20] crypto sys-tem to implement multiplicative homomorphic encryp-tion (MHE).

6 Evaluation

In this section we evaluate STYX using standard bench-marks and use cases. Our goal is to understand the over-head and thus feasibility of the system, and to estimatethe efficacy of the heuristics presented in Section 4.1.

6.1 Linear Road BenchmarkWe use the Linear Road Benchmark (LRB) [9] that mod-els variable toll calculation for a city or county and isquite widespread in the evaluation of stream processingsystems. LRB simulates vehicles traveling through anexpressway with vehicles generating position reports atfixed time intervals. Position reports contain informa-tion like expressway identifier, direction of travel, lane oftravel, mile marker, offset within the mile etc. These po-sition reports are processed by a toll levying agency (e.g.,city, county) to dynamically calculate (a) the amount oftoll to be levied on the vehicle (b) identify accident lo-cations in order to alert vehicles upstream of the acci-dent etc. LRB also specifies latency invariants like timewithin which a toll must calculated and the time withinwhich an accident has to be identified. The upper limitwithin which the system needs to report tolls and acci-dents is 5 seconds. The benchmark rates the system bythe highest number of expressways (L) the system cansupport while maintaining these invariants. The rate atwhich position reports are emitted for one single express-way is shown in Figure 6. As can be seen from thefigure, the experiment runs for three hours and the rateof input steadily increases up to 1811 tuples per second.Figure 5(a) shows the storm topology which implementsthe standard linear road and Figure 5(b) shows the trans-formed STYX topology. Note that Figure 5(b) containstwo new vertexes v6,v7 which are re-encryption vertexesthat are executed within the trusted tier.

Input

v2v1

v4

1

v3

2

3

5

v5

6

78

9 4

(a) Storm

Input

v2v1

v4

1

v3

4

2

35

v5

9

1011

v7v6

78

6

(b) STYX

Figure 5: LRB graph

LRB baseline and hypothesis validation. We first runa baseline deployment of LRB by assigning each vertexa single task. This allows us to observe each individualvertex to see how they consume resources and verify the

8

0

200

400

600

800

1000

1200

1400

1600

1800

0 2000 4000 6000 8000 10000

#T

uple

s/S

ec

Time (s)

Figure 6: LRB data profile

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 2000 4000 6000 8000 10000

Utiliz

ation

Time (s)

v1v2v3v4v5

Figure 7: Storm LRB baseline

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2000 4000 6000 8000 10000

Utiliz

ation

Time (s)

v1v2v3v4v5v6v7

Figure 8: STYX LRB baseline

hypothesis made in Section 4.1 that bottlenecks changewhen running on encrypted data streams. We plot uti-lization (cf. Section 4.1) against time for the duration ofa LRB run (10784 seconds) on Storm with plaintext data(see Figure 7) and on STYX with encrypted data (Fig-ure 8). We can observe that in Figure 7 vertexes v4 and v2have the highest utilization values until around the 8000smark, and after that vertex v1 becomes the node withhighest load. This increase is because the number of tu-ples that require a toll notification increases substantiallyafter 8000s. In the transformed STYX graph, which ranon encrypted data streams, v5 and v1 come under highload until 8000s, and after that v1 becomes the primarybottleneck. This validates our hypothesis that primarybottlenecks differ between graphs running on plaintextvs encrypted streams.

Performance of STYX deployment profile. Now webenchmark both the Storm topology graph and the trans-formed STYX graph using LRB. For this, both graphsare deployed in the best possible configuration so that themaximum number of highways supported can be iden-tified. Table 3 show the results. For plaintext streamsStorm supports 20 expressways, while using STYX withencrypted streams we are able to support 15 express-ways.

Table 3: LRB comparison

System L Time (ms)1 Profile2

Storm 20 2694.44 5,4,1,3,2STYX 15 2672.97 5,2,1,2,3,1,1

1Average response time.2 deployment profile for vertexes in orderv1,v2,v3,v4,v5 for Storm and v1,v2,v3,v4,v5,v6,v7 forSTYX

We also plot the response times for all notification trig-gering tuples, which is the time taken for the notificationto be issued from the time the tuple enters the system.

The response times for STYX are shown in Figure 10and the response times for Storm are shown in Figure 9.Response times while running STYX peaks faster thanStorm, but for 15 expressways STYX is able to main-tain the response time below the threshold allowed bythe benchmark.

Effectiveness of analytical model. The effectivenessof the model can be evaluated by looking at how wellthe model converts the deployment profile for the plain-text streams to the deployment profile for the encryptedstreams in STYX. Referring back to Figures 7 and 8,these graphs can also be used as a baseline to under-stand the model presented in Section 4.1. Vertexes withhigher utilization value should get more instances to ex-ecute them. Table 4 shows the response time of STYXdeployment profile of the Storm graph and correspond-ing STYX graph. As can be seen, the deployment profilegenerated by STYX results in lowest response time. Thisprofile is also in accordance with Figure 8 which showsvertex v1 and v4 should get the highest number of in-stances.

Table 4: LRB deployment profile response time

STYX deployment profile Response time (ms)5,2,1,2,3,1,11 2672.974,2,1,4,2,1,1 2714.35,4,1,3,2,1,1 2781.4

1 Deployment profile generated by STYX

Encryption overhead of IoT device We evaluate theoverhead of encryption in IoT devices. Raspberry Pi [7]is a widely popular IoT device and has the capabilitiessimilar to devices found in highway cameras and vehicletracking and telemetry devices. We implemented threedifferent crypto system namely, AHE, MHE, and DETfor Raspberry Pi version B. We measured the latency toencrypt 16 bytes, a typical message size and the results

9

0

1000

2000

3000

4000

5000

6000

0 2000 4000 6000 8000 10000

Re

sp

onse T

ime (

ms)

Time (s)

Figure 9: Response time for LRB on Storm

0

1000

2000

3000

4000

5000

6000

0 2000 4000 6000 8000 10000

Re

sp

onse T

ime (

ms)

Time (s)

Figure 10: Response time for LRB on STYX

0 50 100 150 200

AHEMHEDET

Encrption Latency (ms)

Figure 11: Encryption overhead for an IoT device

are shown in Figure 11. The encryption latency is accept-able, as it is less than 150ms even for complex schemeslike AHE and MHE.

6.2 Application Case Studies

New York taxi statistics. This application finds the top10 most frequent routes during the last 30 minutes of taxiservicing. A route is represented by a starting grid celland an ending grid cell. The data for this application isbased on a data set released under FOIL (Freedom of In-formation Law) and available at [? ]. The input datacontains the locations (latitude and longitude) and timesof passenger pick ups, MD5 digest of the medallion ofthe taxi that picked up the passenger, trip times and dropoff locations (latitude and longitude). We use a simpli-fied version of this data which contains passenger pickup time, drop off time, and route id. The dataset containsrecords that span over a one year time frame. Wheneverthe top 10 change, the output is appended. We use thisdataset to study the response time of STYX compared toplaintext streams and also to identify how key changesaffect the response time. Response time is defined asthe time between an input tuple that triggers a changein top 10 enters the system and when the top 10 corre-sponding to that tuple is printed.In order to evaluate howresponse time is affected when a key change as outlinedin Section 3.2 is in progress, we initiate a key change atthe beginning of every month. This means any data thatis emitted with a time stamp within the first 30 minutesof every month will be encrypted under the old and newkey. Table 5 summarizes the results of these runs. We can

0 200 400 600 800

1000 1200 1400

0 2000 4000 6000 8000 10000

Response T

ime (

ms)

Time (s)

Figure 12: Response time of top-10 taxi route query withkey change at the start of every month

see that STYX completes processing the data with onlyan additional 25% time compared to the Storm runningon plaintext stream. Furthermore, the increase in com-pletion time or response time caused by effecting a keychange every month is minimal (less than 1%). We alsoshow the response time for the full run with key changesevery month in Figure 12. In this graph we can see in-termittent spikes (total of 12) in response time for sometuples around the time a key change is in progress, butthe majority of the tuples (90th percentile within 31msand 99th percentile within 818ms) respond with the sameresponse time as when no change was in effect.

Table 5: Top-10 taxi routes

System Completion Average responsetime (s) time (ms)

Storm 8106 36.05STYX1 10039 45.1STYX2 10140 46.61

1 Entire stream emitted under same key. 2 Stream emittedwith a new key every month.

Heartbeat analysis. Now we study how STYX canbe used for a realistic, online application like a heartbeat monitor. The end user application runs on special-ized hardware (the monitoring device) that is capable offast encryption and decryption. The monitoring devicecounts the number of heartbeats per minute, encrypts thisvalue and sends it to the cloud for processing and stor-ing. The graph running in the cloud keeps track of daily,

10

0.0020.0040.0060.0080.00

100.00

500 1000 1500

Tim

e (m

s)

#Client Devices

STYX Latency (ms) Storm Latency (ms)

Figure 13: Heartbeat analysis response time

weekly, monthly and yearly statistics. The end user mayrequest to see these statistics on their device, in whichcase the data is retrieved from the topology and shownto the end user. The statistics are maintained by two ver-texes, a “per user” vertex (v1) and an “all users” vertex(v2). User statistics are distributed across the multipleinstances of v1. v1 also emits a summary of its per userstatistics every minute which is grouped by week, month,or year by v2 to find the average value across all users.The client’s device emits a message every time the clientrequests to see a specific data point. For this application,the most critical metric is the response time, that is thetime a user has to wait after requesting to see a metricuntile the metric is actually displayed. The results of thisevaluation are presented in Figure 13. As can be seen, theresponse times for STYX are very close to the plaintextversion for up to 1000 client devices after which STYX’sresponse time degrades compared to the plaintext streamrunning on Storm. This demonstrates that for applica-tions where the input rate is not very high, we are able toget response times similar to plaintext streams.

7 Related Work

To the best of our knowledge, STYX is the first systemto address the issue of data confidentiality in the contextof continuous query execution using partially homomor-phic encryption. In what follows, we discuss prior workin the areas of stream processing and cloud security.

Large body of works on stream and data flow pro-cessing about a decade ago has addressed many chal-lenges related to stream processing such as fault toler-ance, continuous query processing and analytics (e.g.,Aurora project [16, 26, 11], and Telegraph project [15,32]). With the advent of cloud computing, the need forhighly scalable stream processing systems has resultedin the next generation of stream processing systems suchas Storm [1], Heron [27], Spark streaming [37], andSamza [4]. STYX design is based on the stream process-ing design of Storm, noting that our contributions canalso be implemented on top of other stream processing.

Gentry introduced an implementable fully homomor-

phic encryption scheme [21] which has been becomingmore practical ever since [22], but is still not suitedfor encrypted query processing due to its prohibitivecost. Instead, a lot of research work is focused on us-ing partially homomorphic encryption schemes to per-form computations over encrypted data. Most promi-nently, CryptDB [29] is a database system, focusing onexecuting SQL queries on encrypted data. Monomi [36]extends CryptDB to handle analytical queries and intro-duces techniques to improve performance. Both of thesesystems follow a centralized database design and do notconsider streaming workloads as supported by STYX.As such it is unclear whether CryptDB or Monomi caneffectively scale up to typical big data workloads. Mr-Crypt [35] consists in a program analysis for MapRe-duce jobs that tracks operations and their requirementsin terms of PHE. When sequences of operations are ap-plied to a same field, the analysis defaults to FHE, notingthat the system does not currently execute such jobs at alldue to lack of available FHE crypto systems.

Differential privacy [19, 18, 13] is another approachthat aims to improve the accuracy of statistical queries,without revealing information about individual records.The server performing the query in this case is trustedwith access to plain text data. This is in contrast to STYXthat assumes data and computation reside in an untrustedserver.

Another way of offering data confidentiality is throughthe use of specialized hardware that provide a trustedcomputing base [12, 10, 8]. These approaches are or-thogonal to the approach of STYX and could be used toextend the design of STYX by allowing secure compu-tations to be performed on a trusted hardware in an un-trusted cloud. Therefore, reducing the amount of com-putation that needs to be performed in the trusted tier.

8 Conclusion

In this paper, we presented STYX, a practical distributedsystem for evaluating continuous queries over encrypteddata streams. STYX makes use of a set of partially ho-momorphic encryption schemes that allow computationon encrypted data. This allows STYX to leverage un-trusted public cloud resources as its compute platformwithout sacrificing confidentiality. STYX exposes anAPI which allows programmers to develop secure ap-plications with little or no knowledge of the underly-ing crypto systems and STYX heuristics ensure a de-ployment that optimizes cloud usage. We evaluated ourapproach using standard benchmarks and applications,demonstrating its applicability and performance. Ourevaluations show that we can meet the latency require-ments even when the volume of encrypted traffic is high.

11

References

[1] Apache Storm. https://storm.apache.org/.

[2] Intel SGX. https://software.intel.com/en-us/isa-extensions/intel-sgx#.

[3] Nest. https://nest.com/.

[4] Samza. http://samza.apache.org/.

[5] SmartThings. http://www.smartthings.com/.

[6] The GNU Multiple Precision Arithmetic Library.https://gmplib.org/.

[7] Raspberry Pi, 2012. https://www.raspberrypi.org.

[8] ARASU, A., BLANAS, S., EGURO, K., KAUSHIK,R., KOSSMANN, D., RAMAMURTHY, R., ANDVENKATESAN, R. Orthogonal security with ci-pherbase. In Biennial Conf. on Innovative Data-Systems Research (CIDR) (2013).

[9] ARASU, A., CHERNIACK, M., GALVEZ, E. F.,MAIER, D., MASKEY, A., RYVKINA, E., STONE-BRAKER, M., AND TIBBETTS, R. Linear road: Astream data management benchmark. In Int. Conf.on Very Large Data Bases (VLDB) (2004), pp. 480–491.

[10] BAJAJ, S., AND SION, R. Trusteddb: A trustedhardware based database with privacy and data con-fidentiality. In Int. Conf. on the Mgt. of Data (SIG-MOD) (New York, NY, USA, 2011), SIGMOD ’11,ACM, pp. 205–216.

[11] BALAZINSKA, M., BALAKRISHNAN, H., MAD-DEN, S. R., AND STONEBRAKER, M. Fault-tolerance in the borealis distributed stream process-ing system. In Int. Conf. on the Mgt. of Data (SIG-MOD) (2008), pp. 13–24.

[12] BAUMANN, A., PEINADO, M., AND HUNT, G. C.Shielding applications from an untrusted cloud withhaven. In Symp. on Op. Sys. Design and Implemen-tation (OSDI) (2014), pp. 267–283.

[13] BLUM, A., DWORK, C., MCSHERRY, F., ANDNISSIM, K. Practical Privacy: The SuLQ Frame-work. In Symp. on Principles of Database Sys.(PODS) (2005), pp. 128–138.

[14] BOLDYREVA, A., CHENETTE, N., LEE, Y., ANDO’NEILL, A. Order-preserving symmetric encryp-tion. In Int. Conf. on The Theory and Applica-tions of Cryptographic Techniques (EUROCRYPT)(2009), pp. 224–241.

[15] CHANDRASEKARAN, S., COOPER, O., DESH-PANDE, A., FRANKLIN, M. J., HELLERSTEIN,J. M., HONG, W., KRISHNAMURTHY, S., MAD-DEN, S., RAMAN, V., REISS, F., AND SHAH, M.Telegraphcq: Continuous dataflow processing foran uncertain world. In Biennial Conf. on Innova-tive DataSystems Research (CIDR) (2003), vol. 20,p. 668.

[16] CHERNIACK, M., BALAKRISHNAN, H., BAL-AZINSKA, M., CARNEY, D., CETINTEMEL, U.,XING, Y., AND ZDONIK, S. Scalable distributedstream processing. In Biennial Conf. on Innova-tive DataSystems Research (CIDR) (2003), vol. 3,pp. 257–268.

[17] DAEMEN, J., AND RIJMEN, V. The Design ofRijndael: AES - The Advanced Encryption Stan-dard. Springer Verlag, Berlin, Heidelberg, NewYork, 2002.

[18] DINUR, I., AND NISSIM, K. Revealing Informa-tion While Preserving Privacy. In Symp. on Princi-ples of Database Sys. (PODS) (2003), pp. 202–210.

[19] DWORK, C., AND NISSIM, K. Privacy-PreservingDatamining on Vertically Partitioned Databases. InAnnual Int. Cryptology Conf. (CRYPTO) (2004),pp. 528–544.

[20] ELGAMAL, T. A Public-Key Cryptosystem and aSignature Scheme Based on Discrete Logarithms.IEEE Transactions on Information Theory 31, 4(1985), 469–472.

[21] GENTRY, C. A Fully Homomorphic EncryptionScheme. PhD thesis, Stanford, CA, USA, 2009.AAI3382729.

[22] GENTRY, C., HALEVI, S., AND SMART, N. P.Homomorphic evaluation of the aes circuit. IACRCryptology ePrint Archive 2012 (2012). informalpublication.

[23] GROOPMAN, J., AND ETLINGER, S. Consumerperceptions of privacy in the internet of things.Tech. rep., 2015.

[24] GUPTA, T., SINGH, R. P., PHANISHAYEE, A.,JUNG, J., AND MAHAJAN, R. Bolt: Data man-agement for connected homes. In Networked Sys.Design and Implem. (NSDI) (2014), pp. 1–14.

12

[25] HALEVI, S., AND ROGAWAY, P. A tweakable en-ciphering mode. In Annual Int. Cryptology Conf.(CRYPTO) (2003), pp. 482–499.

[26] HWANG, J. H., BALAZINSKA, M., RASIN, A.,CETINTEMEL, U., STONEBRAKER, M., ANDZDONIK, S. High-availability algorithms for dis-tributed stream processing. In Int. Conf. on DataEngineering (ICDE) (2005), pp. 779–790.

[27] KULKARNI, S., BHAGAT, N., FU, M., KEDIGE-HALLI, V., KELLOGG, C., MITTAL, S., PATEL,J. M., RAMASAMY, K., AND TANEJA, S. Twitterheron: Stream processing at scale. In Int. Conf. onthe Mgt. of Data (SIGMOD) (2015), pp. 239–250.

[28] PAILLIER, P. Public-key cryptosystems based oncomposite degree residuosity classes. In Int. Conf.on Theory and Application of Cryptographic Tech-niques (EUROCRYPT) (1999), pp. 223–238.

[29] POPA, R. A., REDFIELD, C. M. S., ZELDOVICH,N., AND BALAKRISHNAN, H. Cryptdb: protect-ing confidentiality with encrypted query process-ing. In Symp. on Op. Sys. Principles (SOSP) (2011),pp. 85–100.

[30] SCHNEIER, B. Description of a new variable-length key, 64-bit block cipher (blowfish). InFast Software Encryption (1994), Springer-Verlag,pp. 191–204.

[31] SHAFAGH, H., HITHNAWI, A., DROESCHER, A.,DUQUENNOY, S., AND HU, W. Talos: Encryptedquery processing for the internet of things. InConf. on Embedded Networked Sensor Sys. (Sen-Sys) (2015), pp. 197–210.

[32] SHAH, M. A., HELLERSTEIN, J. M., ANDBREWER, E. Highly available, fault-tolerant, par-allel dataflows. In Int. Conf. on the Mgt. of Data(SIGMOD) (2004), pp. 827–838.

[33] SONG, D. X., WAGNER, D., AND PERRIG, A.Practical techniques for searches on encrypted data.In Symp. on Security and Privacy (Washington,DC, USA, 2000), SP ’00, IEEE Computer Society,pp. 44–55.

[34] STEPHEN, J. J., SAVVIDES, S., SEIDEL, R., ANDEUGSTER, P. T. Program analysis for secure bigdata processing. In Int. Conf. on Automated Soft-ware Engineering (ASE) (2014), pp. 277–288.

[35] TETALI, S. D., LESANI, M., MAJUMDAR, R.,AND MILLSTEIN, T. D. Mrcrypt: static analysisfor secure cloud computations. In Conf. on Object-Oriented Prog. Sys., Lang. and Applications (OOP-SLA) (2013), pp. 271–286.

[36] TU, S., KAASHOEK, M. F., MADDEN, S., ANDZELDOVICH, N. Processing analytical queries overencrypted data. Proc. VLDB Endow. 6, 5 (2013),289–300.

[37] ZAHARIA, M., DAS, T., LI, H., SHENKER, S.,AND STOICA, I. Discretized streams: An efficientand fault-tolerant model for stream processing onlarge clusters. In W. on Hot Topics in Cloud Com-puting (HotCloud) (Berkeley, CA, USA, 2012),HotCloud’12, USENIX Association, pp. 10–10.

13