A Machine Learning oriented introduction to PALISADE, CKKS and pTensor.
Published:
Spoiler: You can do math on encrypted numbers
Note: “we” means “I”
Overview
1) We introduce the PALISADE library and the cryptographic parameters that we need to specify. We then explain what the cryptographic parameters mean for our application.
2) We use the pTensor library and train a housing price predictor on the Ames dataset, a modern house price dataset.
3) We set up the discussion for the next post in the series.
Note: check the link at the very bottom for the complete source code. Sections have been omitted in this page to reduce clutter.
1) PALISADE
Instructions to install PALISADE can be found here: PALISADE-Dev build instructions. For users new to PALISADE and C++, we highly recommend bookmarking the PALISADE Doxygen page containing the library’s documentation.
i) What is PALISADE
From the README.md
on the PALISADE page
PALISADE is a general lattice cryptography library that currently includes efficient implementations of the following lattice cryptography capabilities:
- Fully Homomorphic Encryption (FHE)
- Brakerski/Fan-Vercauteren (BFV) scheme for integer arithmetic
- Brakerski-Gentry-Vaikuntanathan (BGV) scheme for integer arithmetic
- Cheon-Kim-Kim-Song (CKKS) scheme for real-number arithmetic
- Ducas-Micciancio (FHEW) and Chillotti-Gama-Georgieva-Izabachene (TFHE) schemes for Boolean circuit evaluation
- Multi-Party Extensions of FHE (to support multi-key FHE)
- Threshold FHE for BGV, BFV, and CKKS schemes
- Proxy Re-Encryption for BGV, BFV, and CKKS schemes
ii) Machine Learning Application
The takeaway for us machine learning practitioners is that we can train encrypted machine learning models to output encrypted predictions after training said model on encrypted data.
2) PALISADE’s Cryptographic Parameters
We as machine learners(?) need to have a rough idea of the following parameters:
i) multDepth
This describes the depth of multiplication supported. Informally, when we encrypt data, we add some noise to increase the scheme’s security. When doing mathematical operations on these data, our noise increases (linearly in addition and subtraction but squared in multiplication).
There is no single “best” value to set the multDepth to and this is highly dependent on your problem. The following are some example equations and their corresponding multiplication depth
\((a * b) + (c * d)\) has a multiplication depth of 1
\(a * b * c\) has a multiplication depth of 2
ii) scalingFactorBits
In the original CKKS paper, the authors discuss a scaling factor they multiply values with. The scaling factor prevents rounding errors from destroying the significant figures during encoding. Unfortunately, it is difficult to discuss this parameter without discussing the paper’s core ideas, so we leave this for the next post. Thankfully, PALISADE is reliable in informing us if the scalingFactorBits
is set too low.
We tend to use values between 30 and 50 for most of the applications.
iii) batchSize
The batchSize is a tricky parameter to set correctly. The issue is that the batch size must be equal to
\[\frac{\text{Ring size}}{2}\]Unfortunately, one needs to set multDepth, then look at ring size before doing it all over again with batchsize set to be equal to half the ring size. It’s a little hairy, yes, but this is the price we pay for privacy.
3) pTensor library:
For this discussion we encourage readers to refer to linear_regression_ames.cpp but we also highlight the critical sections in our discussion.
i) pTensor
The pTensor library’s motivation is to provide those with a machine learning or data science background the ability to train encrypted machine learning models in a framework that looks and feels familiar. Where possible we aimed to mimic the numpy library in terms of behavior (e.g allows broadcasting, *
corresponds to the Hadamard product, etc.)
In line with the library’s motivation, there are many aspects hidden from the user, but we briefly discuss important concepts that the inquisitive user may stumble upon while perusing the source code.
ii) Complex numbers
CKKS operates on complex numbers for various reasons that we will discuss in the follow-up but know that we only focus on the real-number portion from these complex numbers.
iii) Packing
To pack the data essentially means that we encode multiple data points into a single ciphertext. Homomorphic encryption is a slow process, but by leveraging SIMD, we can carry out our operations faster. An analogy would be doing a for-loop
vs. vectorized operation in numpy. Because the size of our ciphertexts is already very large, it is advantageous to store the data in transpose form to reduce the number of encryptions we need to do and to allow for faster element-wise operations.
iv) pTensor::m_cc
The m_cc
object is the cryptographic context which we use to carry PALISADE’s operations.
4) Using pTensor on the Ames dataset
i) Setting up the cryptographic contexts
We show
- how to create a cryptocontext, which configures PALISADE to perform encrypted computation within a specific encryption scheme
- code for training on the Ames dataset
Should one attempt to follow the process in Numpy
or in Eigen
know that because of the noise and the way our encryption scheme operates, one may achieve slightly different results between those plaintext versions and this encrypted version.
We briefly introduce the parameters used below but defer further discussion to later.
auto cc = lbcrypto::CryptoContextFactory<lbcrypto::DCRTPoly>::genCryptoContextCKKS(
multDepth, scalingFactorBits, batchSize
);
cc->Enable(ENCRYPTION);
cc->Enable(SHE);
cc->Enable(LEVELEDSHE); // @NOTE: we discuss SHE and LeveledSHE in the follow up
auto keys = cc->KeyGen();
cc->EvalMultKeyGen(keys.secretKey);
cc->EvalSumKeyGen(keys.secretKey);
int ringDim = cc->GetRingDimension();
int rot = int(-ringDim / 4) + 1;
// @NOTE: we discuss EvalAtIndex in the followup
cc->EvalAtIndexKeyGen(keys.secretKey, {-1, 1, rot});
We create a cryptocontext object which takes our chosen parameters:
multDepth
- The maximum number of sequential multiplications we can do before our data becomes too noisy and the decryption becomes meaningless.
scalingFactorBits
- the scaling factor mentioned above and to be discussed later.
batchSize
- how many data points (think vector of data) we pack into a ciphertext. Homomorphic encryption is slow but can be sped up by conducting operations over batches of data (via SIMD)
ii) Training setup
iii) constructDataset
Notice how the parameters that the function takes in are the plaintext X and y. The reason for passing in plaintext X’s and y’s is to allow for easy indexing into the data for shuffling. It would be possible to shuffle the data in encrypted form but it is prohibitively slow and an easier alternative already exists. Thus, to simulate shuffling the data every epoch, we allow the user to specify some number of shuffles, and the data owner creates n-shuffles of the data that is then encrypted.
While training, we can simulate this randomness by randomly indexing into any of the shuffles.
iv) Training
The following should look familiar to anyone familiar with machine learning
for (unsigned int epoch = 0; epoch < epochs; ++epoch) {
auto index = distr(generator);
auto curr_dataset = dataset[index];
auto X = std::get<0>(curr_dataset);
auto y = std::get<1>(curr_dataset);
auto prediction = X.encryptedDot(w);
auto residual = prediction - y;// Remember, our X is already a transpose
auto _gradient = X.encryptedDot(residual);
pTensor gradient;
gradient = _gradient;
auto scaledGradient = gradient * alpha * scaleByNumSamples;
w = pTensor::applyGradient(w, scaledGradient);
w = w.decrypt().encrypt();
However, there are a few things to note:
1) encryptedDot
instead of dot
(which is also supported)
In the first encryptedDot
, in the case of a matrix-matrix, we do a Hadamard product before doing a summation along the 0th axis. Again,our X is encrypted in transpose form, of shape (#features, #observations). Thus, our weight matrix is of shape (#features, #observations). We leave it to the reader to work out the details of why this works.
In the other case (not matrix-matrix), we default to the standard dot product.
2) applyGradient
To understand the motivation here, we must first discuss the shape of the incoming values
w: (#features, #observations)
scaledGradient: (1, #features)
So, we must modify the scaledGradient
into a repeated Matrix form to apply it to the weights
3) w.decrypt().encrypt()
The reason for our decrypt-encrypt has to do with the multDepth
parameter that we briefly discussed earlier. As mentioned, as we do operations on our ciphertexts, we accumulate noise. If this noise gets too large, our decryption will begin to fail. This failing results in random bits interpreted as (usually huge) random numbers. By decrypting and encrypting our results again, we can refresh this noise (reduce the noise to 0).
However, there is a caveat here: only the party with the secret key can do the encrypting. Consider a scenario where we have a data enclave-client setup where the client does all the computations. There is a limit to the maximum multDepth
one can set before CKKS becomes too unwieldy. Computations that exceed that multDepth
need either server reencryption (like shown here) or Bootstrapping (which we will address in the next post) to securely reencrypt the data. Bootstrapping resets the noise and thus the multiplicative depth. However Bootstrapping for CKKS is not yet available for PALISADE as of Feb 2021. This server reencryption process is considered less secure compared to a fully homomorphic setup, but we defer further discussion to the next post.
5) Closing Thoughts
P.s: visit PALISADE - PKE for further examples of how to use PALISADE (one of which I contributed to!).