ML Practitioners Introduction to CKKS.

This is part 1.5 in a series entitled “A Machine Learning Based Introduction to PALISADE and CKKS”. I highly recommend viewing the first post before continuing here.

Special thanks to Dr David Cousins for his role in this including, but not limited to, reviewing the content, offering feedback on the code, and suggesting changes to make the material more digestible.

Tutorial 1.5: CKKS for the curious

This tutorial is meant for ML practitioners who are interested in CKKS, and Homomorphic Encryption in general. We attempt to make this section digestible but will also provide links for those who want in-depth reading material.

We use the following acronyms:

  • Partially homomorphic encryption (PHE)

  • Somewhat homomorphic encryption (SHE)

  • Levelled Somewhat homomorphic encryption (LeveledSHE)

  • Fully homomorphic encryption (FHE)

  • Cheon-Kim-Kim-Song (CKKS) which was the new name after HEAAN.


Homomorphic Encryption

There are several different levels of homomorphic encryption, a few of which we discuss briefly, but we focus on two primary algorithm styles: the SHE and the LeveledSHE because the PALISADE library supports them.

Note: When we say “allow”, we mean that these algorithms support a meaningful operation between two ciphertexts. One can always multiply or add ciphertexts but the decrypted results are not always meaningful.

Partially Homomorphic Encryption

The PHE encryption style-algorithm allows either addition or multiplication operations on elements within its group. One unique property is that these algorithms are unbounded in that we can carry out any number of either additions or multiplications. Many popular encryption schemes fall into this category: for example, RSA and ElGamal, both of which support multiplications on their ciphertexts. Another PHE scheme is the Paillier encryption scheme which supports addition on its ciphertexts.

Somewhat Homomorphic Encryption

Somewhat homomorphic encryption supports both addition and multiplication logic gates instead of one-or-the-other as in the case of PHE. However, it still only supports a subset of circuits as it is now bounded in the number of operations it can do. As we carry out these operations, there is a non-zero amount of noise intentionally added for security. As we carry out more and more operations we may eventually exceed out threshold in which case the ciphertext will no longer exhibit the homomorphic (or approximately homomorphic) property. Note that homomorphic addition and homomorphic multiplication add different levels of noise: addition scales the noise linearly while multiplication squares the noise.

One method would be to “refresh” the ciphertext, but this was not possible until the work of Gentry which allowed for FHE. One example of a somewhat homomorphic encryption scheme is the BGV Scheme which was produced after Gentry’s thesis.

We refer the interested reader to Acar et al. A Survey on Homomorphic Encryption Schemes: Theory and Implementation for more information about SHE and HE in general. P.s click on the “ereader” button, then on the download link at the top of the next page.

Fully Homomorphic Encryption

Fully homomorphic encryption was not possible until the work of Craig Gentry in his thesis. In the thesis Gentry showed that it was possible to evaluate circuits of arbitrary depth via a process he termed bootstrapping. The fundamental idea of bootstrapping is that it is possible to “refresh” the ciphertext (thus reducing the noise). Gentry showed that by taking the decryption algorithm and decrypting both the ciphertext and the encrypted secret key ( encrypted using the public key ), it is possible to obtain a new ciphertext which has the same underlying plaintext but has the noise removed.

This process is equivalent to the following equation:

The equation describes the ciphertext at time i+1 and how it is derived from the original ciphertext.

Nevertheless, the process of bootstrapping was (and still is) an expensive operation which can significantly slow down any application that requires a large number of additions or multiplications (such as in machine learning). Work following Gentry’s thesis focused on reducing the need for bootstrapping and one such example of this is the BGV scheme that was mentioned above.


What is CKKS?

To be pedantic, CKKS is not a homomorphic encryption scheme but is instead an approximate homomorphic encryption scheme.

CKKS is an encryption scheme that allows for operations on real numbers as opposed to just integers as is the case for BGV, and BFV which are other H.E. schemes (that we do not discuss in this ML practitioners guide).

Machine learning with homomorphic encryption was not fully supported until the CKKS scheme. Although various homomorphic encryption schemes emerged before CKKS, and it was possible to do some forms of machine learning, all of them dealt with integers. However, work by Cheon et al. in their paper entitled “Homomorphic Encryption for Arithmetic of Approximate Numbers” (also known as the CKKS scheme after the authors) general forms of machine learning became viable. CKKS operates on complex numbers, but via the use of an embedding, we can work on real numbers.

There are two sources of noise in CKKS. The first source of noise is in the encoding process that we discuss below. The second source is in the “relinearization” process, which we touch on briefly after.

The Encoding Process

Earlier, we had mentioned that CKKS operates on the complex numbers and that these values can be converted to the real numbers via an encoding. We refer to the left half of the following image, the encoding-decoding step:

CKKS Encoding

*image taken from Simons Institute Talk

The definition of tau has been left out, but tau is essentially

which is a mapping from the rational polynomials of at most degree to the complex numbers.

Encoding:

​ we first take a message vector, m, and transform it from the complex numbers to the rationals via . Then, we scale by some delta which allows us to control for precision loss further down the line. Finally, we round to the nearest integer which takes us from the rationals to the integers (this is part of the reason why the scaling was necessary).

Decoding:

​ When decoding we have a plaintext that is in the integer polynomials. We transform this via which takes us to the complex numbers where we then scale down to get us back to the same scale that we were originally in.

The Relinearization Process

To discuss the intuition behind this process, we first need to discuss how numbers are stored in computers. For every floating number, there is a significand and a scaling factor (comprised of a base and exponent)

For example, in a (truncated) representation of , we have:

. The significand is 1414, and the scaling factor is $$10^{-3}$ where 10 is the base and -3 is the exponent.

The critical insight is the following: if we are doing approximate arithmetic, for example, multiplying we have:

. This rounding off causes the stochasticity mentioned above.

As we saw in our example involving $\sqrt{2}$, when we multiply the floating-point representation of these numbers, the significand is squared. So, when we wanted to get the number back into the same “scale”, we truncated the number and rounded.

A similar method is used here, in CKKS. In CKKS, the plaintext is stored in the least significant bits of the ciphertext. As we multiply our ciphertexts together, the significands are squared, and for us to return to the same “scale”, we need to truncate. This truncation adds to the stochasticity.

We encourage users to watch the video at this link.

Closing Words

Thank you for taking the time to read this! I hope that you leave this post with a better understanding of what HE is as well as the CKKS encryption scheme. As a side note, we encourage users to experiment with different parameters at Tutorial 1.5 code which is similar to the the 1.0 code but with an emphasis on experimenting with the parameters as opposed to showcasing the efficacy of HE + ML. Then, users should toy with the example from Tutorial 1 and experiment with the different Cryptographic parameters to see the impact of different cryptographic parameters on a “real world” example. We feel that this will help the user gain a better working understanding of the code.

I have a full project that is slated to be released before the end of the year where I use Python bindings and train a linear regression on the Ames dataset comparing a plaintext numpy implementation and the encrypted linear regression. In that project, I provide a more in-depth discussion of the details of CKKS and Linear Regression while keeping the focus on showing the efficacy of homomorphic encryption and machine learning. Follow or add me on LinkedIn - ianquahtc where I will share the project once it is ready.

In the following months, I hope to share more work that I have been doing including but limited to creating a matrix interface (likely unoptimized) that will allow users to work in a format akin to numpy.

Further reading:

Craig Gentry simplified paper

  • We highly recommend this paper because it is a simplified version of some of the papers you will need to read to gain a deep appreciation of what is happening.

HEAAN Demystified

  • We recommend reading this before the actual HEAAN paper, specifically Section 3, which summarizes the HEAAN paper for non-cryptographers.

The actual CKKS/HEAAN paper

  • This paper is a doozy and probably out of scope for someone who cares solely about the ML application of the work.
Written on December 2, 2020