The objective of the project was to design a system that would allow users to store data securely on a public cloud provider such as Amazon's \Veb Service, specifically Amazon's Simple Storage Service. \Ve also wanted to add the ability to search through encrypted data to return only the files that were relevant to the users needs. The reason for doing this is that there is a lack of trust in the security of cloud providers as stated by Armbrust et al in [1].
8.1 Summary of Approach
\Ve started out by examining distributed storage and what techniques are used to secure distributed storage where the storage is not trusted since these techniques could be relevant to our project. We then studied the various research and techniques being used to allow for encryption with keyword search and found that Symmetric Encryption with Keyword search was most relevant to our needs.
The next step was to list the requirements needed to make a storage system secure, and we found a survey that gave a concise list of requirements that we adapted to our setting. The requirements that needed to be fulfilled were for data Confidentiality, data Integrity, the ability for File Sharing, Key Revocation for removing read user access, the ability to recover from a Compromised Key-pair and lastly the need to search through encrypted data. There was also a need for an authentication and access control protocol
University
of Cape
Town
\vhich ensured that only legitimate users were able to access the system.
\,ye were able to fulfill these requirements with the use of a modified data structure that was adopted from Goh et a1. [12] for securing distributed storage systems, named Sirius, that we called the Secure File Object. This modified data structure fulfilled all the require-ments except for Searchability. To fulfill the searchability requirement of the project we adapted the Symmetric Encryption with Keyword Search techniques developed by Waters et a1. in [30]. The data structure used for searchability, in our design was called SFO K ey-words. At a high level each Secure File Object has an SFO keywords file object attached to it, which is used to store the keywords attached to the data. Should an encrypted keyword be submitted, the server only does the cryptographic computations on the SFO keywords files, returning the results where there is a match.
\,ye then implemented a prototype in Java to run on a Linux system using a 1024bit RSA key pair per user. There is a client that handles all cryptographic operations on the users computer and there is a server which runs on a compute instance. This server accepts requests from the client and does the necessary computations for the search algorithm. Our data encryption was done using the DES encryption algorithm and we generated hashes of the encrypted data using MD5 for both the data fields and the Secure File Object as a
\\"hole.
This prototype was then used to perform testing which we then used to evaluate the success of the project. We found that the requirements were successfully fulfilled and the performance impact of fulfilling these requirements was acceptable. \,ye tested the performance impact these requirements had in terms of Data Encryption Time, File Size Overhead, Secure vs Unsecure Puts, the Client Interface and the Search Algorithm. The Search Algorithm was tested at both the client and the server. The client search testing was done merely to examine the effect this scheme would have for users. The server search
University
of Cape
Town
testing was done in order to remove network latencies and transfer speeds to compare the theoretical analysis and the actual results that we were getting. \Ve found that our scheme has a very small space overhead of about a constant 910 bytes for our test cases. Since this overhead is so small the difference in uploading unsecure files versus using our scheme is also insignificant. The data encryption time increased linearly with the size of input data, which is expected, and the time taken to respond to search queries also increased with the number of search operations needed, also as expected. We found that our measurements needed to be performed with nanosecond precision. Initially we had twenty search iterations which provided us with limited accuracy in the results but as we increased these iterations to one hundred and beyond, the actual results started to converge with our theoretical analysis, to within 4% in some cases. The deviation in the low iteration range is considered to be due to the Java and Operating System internals which are disproportionately skewed in the case of few iterations. The results of the study are of significant since they highlight the fact the secure cloud storage with search functionality can be achieved. The searchable encryption algorthms can be applied in a cloud context and the performance testing and analysis can prove benefial to future research.
8.2 Future Work
There is further work to be done with this project. A study could be performed in the searching aspect of the system, more specifically querying functionality. At the moment the design performs a brute force scan across all the keywords within all the files. One could look at various indexing techniques that could be used in an encrypted setting to perform more efficient look ups. This could be extended to allow for range scans across the encrypted keywords using indexing techniques
Another study could be performed to compare the performance of RSA and Eclypitic Curve Cryptography to determine the speed and storage overheads of each in a cloud
set-University
of Cape
Town
ting.
Another draw back of the system is that it is currently designed to encrypt/decrypt whole files. There is no ability to perform random access on a file. If a user wishes to modify a certain block of the file, then the entire file must be downloaded, decrypted, modified, encrypted and sent back into the cloud. Techniques could be used from secure distributed storage systems to overcome this issue.
8.3 Meeting the Objectives
\Ve stated that the hypothesis of the study was to design a solution that would allow users to securely store data on an untrusted public cloud provider, whilst allowing for encrypted keyword search. We set out evaluate this by ensuring that the secure storage requirements were met and performed the task of examining the overheads of securing data on a cloud provider as well as the performance overheads of adding encrypted searchability.
\Ve performed the testing to evaluate the implementation of the requirements, and the impacts of them. Based on these tasks, we have found that our design proves to be efficient in both the storage overheads and the processing time added when searching through a small number of files and that all the requirements could be met.
The study has shown that secure searchable storage can be securely added in a cloud storage serVIce.