Aug 26, 2021

Their threat model[1] states:

> This feature runs exclusively as part of the cloud storage pipeline for images being uploaded to iCloud Photos and cannot act on any other image content on the device. Accordingly, on devices and accounts where iCloud Photos is disabled, absolutely no images are perceptually hashed. There is therefore no comparison against the CSAM perceptual hash database, and no safety vouchers are generated, stored, or sent anywhere.

and

> Apple’s CSAM detection is a hybrid on-device/server pipeline. While the first phase of the NeuralHash matching process runs on device, its output – a set of safety vouchers – can only be interpreted by the second phase running on Apple’s iCloud Photos servers, and only if a given account exceeds the threshold of matches.

We should also take account the way how blinding the hash works from CSAM paper[2]:

> However, the blinding step using the server-side secret is not possible on device because it is unknown to the device. The goal is to run the final step on the server and finish the process on server. This ensures the device doesn’t know the result of the match, but it can encode the result of the on-device match process before uploading to the server.

What this means, that whole process is tied strictly to specific endpoint in the server. To be able to match some other files from device into the server, these are also required to be uploaded into the server (PSI implementation forces it). And based on the pipeline description, upload of other files should not be possible. However, if it is and they suddenly change policy to expand to scan all files of your device, they will end-up into the same iCloud as other files, and you will notice them and you can't opt out from that with the current protocol. So they have to modify whole protocol to include only those images which are actually meant to be synced, and then scan all the files (which are then impossible to match on server side because of the how PSI protocol works). If they create some other endpoint for files which are not supposed to end up into iCloud, they need store them in the cloud anyway, because of the PSI protocol. Otherwise, they have no possibility to detect matches.

It sounds like that this is pretty far away from just policy change away.

Many people have succumbed to populism as it benefits them, and it takes some knowledge and time to really understand the whole system, so I am not surprised that many keep talking, that it is just policy change away. Either way, we must trust everything what they say, or we can't trust a single feature they put on the devices.

[1]: https://www.apple.com/child-safety/pdf/Security_Threat_Model...

[2]: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 25, 2021

From the spec: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 25, 2021

Clearly this is not a cryptographic hash, and hence it's known hashes are not uniformly distributed.

Apple explained in their technical summary [0] that they'll only consider this an offence if a certain number of hashes match. They estimated the likelihood of false positives there (they don't explain which dataset was used, but it was non-CSAM naturally) is 1 out of a trillion [1]

In the very unlikely event where that 1 in a trillion occurrence happens, they have manual operators to check each of these photos. They also have a private model (unavailable to the public) to double-check these perceptual hashes which also used before alerting authorities.

[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... [1] https://www.zdnet.com/article/apple-to-tune-csam-system-to-k...

Aug 22, 2021

Requiring a second key to unlock a lock does not invalidate the fact that the first key can be picked (which the question was about).

I had read through the technical whitepaper [1], which does not include this information. Thank you for sharing. Since the second hash only works on pictures that Apple can decrypt within this system ("for an account that exceeded the match threshold"), this merely saves the human reviewers at Apple time.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 21, 2021

Lie? I don't take kindly to such words, because you're ascribing malicious intent where there is none. Please check your tone... HN comments are about assuming the best in everyone.

This is only applying to photos uploaded to iCloud. Every single thing talks exactly about that, including the technical details: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

The hash matching is occurring on device, but only for iCloud photo images:

> Before an image is stored in iCloud Photos, an on-device matching process is performed for that image against the database of known CSAM hashes. This matching process is powered by a cryptographic technology called private set intersection, which determines whether there is a match without revealing the result. The device creates a cryptographic safety voucher that encodes the match result. It also encrypts the image’s NeuralHash and a visual derivative. This voucher is uploaded to iCloud Photos along with the image.

Read that PDF. You'll see everything in it is designed for iCloud photos only.

Aug 20, 2021

True.

But I'll point out their technical summary[1] explictly talks about attaching decryption keys to positive matches, which you don't need to do if there is no end-to-end encryption.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 20, 2021

I didn't realise that theres no requirement to scan. So I'll yield on that.

But its still possible that Apple are preparing for a scenario where it does become a requirement in the future.

As for the E2EE part, I can't imagine that this wouldn't be launched with E2EE alongside, otherwise theres literally no point whatsoever, they could have just done the scan on iCloud.

As for why this combats 'your data being whisked away', check the technical documentation. What they're doing with Private Set Intersection and Threshold Secret Sharing are clear steps to make this system unexploitable, anonymous, and so that it doesn't leak any metadata whatsoever.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 20, 2021

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Read the technical documentation.

They're combining Private Set Intersection and Threshold Secret Sharing in a way that means that 1 hit, isn't enough. They can't even tell how many red flags you have.

Aug 19, 2021

Read this: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

and tell me what's not about privacy with this? The alternative is upload everything unencrypted to providers who then scan there, which is how everything else works.

Aug 19, 2021

I would just read the document explaining how this works (see "Matching-Database Setup"): https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

At no point is anyone besides Apple able to view any NeuralHash hashes from the CSAM database. You can verify the database is the same on all iPhones, but you are not able to look at any of the hashes.

Aug 19, 2021

There are a lot of really valid criticisms of Apple plan here, but Apple has gone out of their way to prevent that exact case. Apple is using secret splitting to make sure they cannot decode the CSAM ticket until the threshold is reached. Devices also produce some synthetic matches to prevent themselves Apple (or anyone else) inferring a pre-threshold count based on the number of vouchers.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Threshold Secret Sharing Synthetic Match Vouchers

Aug 19, 2021

FAQ Part 2/2

Q: If the second, secret hash algorithm is based on a neural network, can we think of its weights (coefficients) as some kind of secret key in the cryptographical sense?

A: Absolutely not. If (as many suspect) the second hash algorithm is also based on some feature-identifying neural network, then we can't think of the weights as a key that (when kept secret) protects the confidentiality and integrity of the system.

Due to the way perceptual hashing algorithms work, having access to the outputs of the algorithm is sufficient to train a high-fidelity "clone" that allows you to generate perfect adversarial examples, even if the weights of the clone are completely different from the secret weights of the original network.

If you have access to both the inputs and the outputs, you can do much more: by choosing them carefully [4], you can eventually leak the actual secret weights of the network. Any of these attack can be executed by an Apple employee, even one who has no privileged access to the actual secret weights.

Even if you have proof positive that nobody could have accessed the secret weights directly, the entire key might have been leaked anyway! Thus, keeping the weights secret from unauthorized parties does not suffice to protect the confidentiality and integrity of the system, which means that we cannot think of the weights as a kind of secret key in the cryptographical sense.

Q: I heard that it's impossible to determine Apple's CSAM image hashes from the database on the device. Doesn't this make a hash attack impossible?

A: No. The scheme used by Apple (sketched in the technical summary [6]) ensures that the device doesn't _learn_ the result of the match purely from the interaction with server, and that the server doesn't learn information about images whose hash the server doesn't know. The claim that it's "impossible to determine Apple's CSAM image hashes from the database on the device" is a very misleading rephrasing of this, and not true.

Q: Doesn't Apple claim that there is only a one in one trillion chance per year of incorrectly flagging a given account?

A: Apple does claim this, but experts on photo analysis technologies have been calling bullshit [8] on their claim since day one.

Moreover, even if the claimed rate was reasonable (which it isn't), it was derived without adversarial assumptions, and using it is incredibly misleading in an adversarial context.

Let me explain through an example. Imagine that you play a game of craps against an online casino. The casino will throw a virtual six-sided die, secretly generated using Microsoft Excel's random number generator. Your job is to predict the result. If you manage to predict the result 100 times in a row, you win and the casino will pay you $1000000000000 (one trillion dollars). If you fail to predict the result of a throw, you lose and pay the casion $1 (one dollar).

In an ordinary, non-adversarial context, the probability that you win the game is much less than one in one trillion, so this game is very safe for the casino. But this number, one in one trillion, is based on naive assumptions that are completely meaningless in adversarial context. If your adversary has a decent knowledge of mathematics at the high school level, the serial correlation in Excel's generator comes into play, and the relevant probability is no longer one in one trillion. It's 1 in 216 instead! Whenfaced with a class of sophomore math majors, the casino will promptly go bankrupt.

Q: Aren't these attacks ultimately detectable? Wouldn't I be exonerated by the exculpatory evidence?

A: Maybe. IANAL. I wouldn't want to take that risk. While matching hashes are probably not sufficient to convict you, and possibly not sufficent to take you into custody, but it's more than sufficient to make you a suspect. Reasonable suspicion is enough to get a warrant, which means that your property may be searched, your computer equipment may be hauled away and subjected to forensic analysis, etc. It may be sufficient cause to separate you from your children. If you work with children, you'll be fired for sure. It'll take years to clear your name.

And if they do charge you, it will be in Apple's best interest not to admit to any faults in their algorithm, and to make it as opaque to the court as possible. The same goes for NCMEC.

Q: Why should I trust you? Where can I find out more?

A: You should not trust me. You definitely shouldn't trust the people defending Apple using the claims above. Read the EFF article [7] to learn more about the social dangers of this technology. Consult Apple's Threat Model Summary [5], and the CSAM Detection Technical Summary [6]: these are biased sources, but they provide sketches of the algorithms and the key factors that influenced the current implementation. Read HackerFactor [8] for an independent expert perspective about the credibility of Apple's claims. Judge for yourself.

[1] https://imgur.com/a/j40fMex

[2] https://graphicdesign.stackexchange.com/questions/106260/ima...

[3] https://arxiv.org/abs/1809.02861

[4] https://en.wikipedia.org/wiki/Chosen-plaintext_attack

[5] https://www.apple.com/child-safety/pdf/Security_Threat_Model...

[6] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

[7] https://www.eff.org/deeplinks/2021/08/apples-plan-think-diff...

[8] https://www.hackerfactor.com/blog/index.php?/archives/929-On...

Aug 19, 2021

Happy to provide citation of how the apple scheme works [1].

You should also point out that the NCMEC themselves are not law enforcment.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 19, 2021

NeuralHash is a hashing algorithm made by Apple to create hashes from images. Where other hashing algorithms would look at the pixel values, NeuralHash creates hashes based on the visual features of an image.

You can read more about it here: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

NeuralHash collisions are interesting, but the way Apple is implementing their scanner it's impossible to extract the banned hashes directly from the local database.

There are other ways to guess what the hashes are, but I can't think of legal ones.

> Matching-Database Setup. The system begins by setting up the matching database using the known CSAM image hashes provided by NCMEC and other child-safety organizations. First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations. Next, these NeuralHashes go through a series of transformations that includes a final blinding step, powered by elliptic curve cryptography. The blinding is done using a server-side blinding secret, known only to Apple. The blinded CSAM hashes are placed in a hash table, where the position in the hash table is purely a function of the NeuralHash of the CSAM image. This blinded database is securely stored on users’ devices. The properties of elliptic curve cryptography ensure that no device can infer anything about the underlying CSAM image hashes from the blinded database.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

That’s an oversimplified and misleading description of how the system works, but ok. I recommend reading the technical description, or even the paper linked from that: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

The neural hash itself isn’t cryptographic, but there’s cryptography involved in the process.

They use “private set intersection” (https://en.wikipedia.org/wiki/Private_set_intersection) to compute a value that itself doesn’t say whether an image is in the forbidden list, yet when combined with sufficiently many other such values can be used to do that.

They also encrypt the “NeuralHash and a visual derivative” on iCloud in such a way that Apple can only decrypt that if they got sufficiently many matching images (using https://en.wikipedia.org/wiki/Secret_sharing)

(For details and, possibly, corrections on my interpretation, see Apple’s technical summary at https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... and https://www.apple.com/child-safety/pdf/Apple_PSI_System_Secu...)

Aug 18, 2021

Edit: "The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes."[1]

Apple isn't using a "similar image, similar hash" system. They're using a "similar image, same hash" system.

[1]: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

>Apple computes a hash of each image you upload to iCloud then check it against a list of CP hashes.

I don't think it computes a hash of the image, it's a tad more involved than that.

Simple hashing is easily evaded. They must be computing an identifier from the contents of the images in the CSAM database. This requires computational analysis on the handset or computer. If that's all that were happening that would be no problem, but of course there are management interfaces to the classifer/analyzer, catalog, backend, &c

The contents of the identifiers are purposefully opaque to prevent spoofing of the identifier database. I don't know what is included in the images; what if I take a picture at Disneyland with a trafficked person in the frame? Will that make it into the qualifier database? What is added to the CSAM signature database and why? What is the pipeline of hashesfrom NCMEC and other child-safety organizations->Apple's CSAM image classifer alarm?

>I get it, the mechanism they're using has apparent flaws, and maybe some whacko could somehow get access to your phone and start uploading things that trick the algorithm into thinking you have CP.

The CSAM analyzer could be subverted in any number of ways. I question how the CSAM identifiers are monitored for QA (I actually shudder thinking there are already humans doing this :( how unpleasant.) and the potential for harmful adversaries to repurpose this tool for other means. One contrived counterfactual: Locating pictures of Jamal Kashoggi in people's computer systems by 0-day malware. Another: Locating images of Edward Snowden. A more easily conceived notion: Locating amber alert subjects in people's phones, geofenced or not.

To my eyes, it appears we will soon have increased analysis challenges. Self analysis of device activity and functions for image scanning malware (for example) is slightly harder, we have added a blessed one with unknown characteristics running on the systems. Does this pose a challenge to system profiling? How/does this interact with battery management? Is only iCloud scanning, or is everything scanned and then only checked before being sent to iCloud? (this appears to be the case[X])

There should be user notification too. If some sicko sends me something crazy somehow, I would surely want to know so I can call the cops!!

All in all this makes me feel bad. There is not a lot of silver lining from my perspective. While the epidemic of unconscionable child abuse continues, I question the effectiveness of this approach.

I would not consider jailbreaking my iPhone but for this kind of stuff. I would like to install network and permissions monitoring software on my iPhone such as Bouncer[0], Little Snitch[1], although these are helpfully not available for iOS.

I feel grateful that I am unlikely to be affected by this image scanning software, I'm planning to continue my personal policy of never storing any pictures of any people whatsoever. I don't even store family photos this way. My Life is not units in a data warehouse.

[0] - https://play.google.com/store/apps/details?id=com.samruston....

[1] - https://www.obdev.at/products/littlesnitch/index.html

[X] - Apple's Whitepaper: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

> If hashes are uploaded to devices, they can be extracted and images that clash against it can be created.

Many organizations have the hashes, so they could leak nonetheless. Either way, I don't think that's a major problem. If the system interprets a picture of a pineapple as CSAM, you only need to produce the picture of a pineapple to defend yourself against any accusations. If clashes are too commonplace, the entire system would become unreliable and would have to be scrapped.

In any case, I have looked it up. The database is indeed on the device, but it's encrypted:

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

> Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

Overall, after reading the PDF, here is my understanding of the process:

1. Apple gathers a set of "bad hashes"

2. They upload to each device a map from a hashed bad hash to an encrypted bad hash

3. The device runs an algorithm that determines whether there are matches with hashed bad hashes

4. For each match, the device uploads a payload encrypted using a secret on-device key, and a second payload that contains a "share" of the secret key, encrypted using the neural hash and encrypted bad hash.

5. The device also periodically uploads fake shares with dummy data to obfuscate the number of matches that actually occurred. Apple can't tell fake shares from real ones unless they have enough real shares.

6. Once Apple has enough real shares, they can figure out the secret key and know which hashes caused a match.

The main concern I have, and as a non-expert, is step 2: it requires Apple to provide their key to an auditor who can cross-check with child protection agencies that everything checks out and no suspect hashes are included in the payload. In theory, that needs to be done every time a new on-device database is uploaded, but if it is done, or if child protection agencies are given the secret so that they can check it themselves, I think this is a fairly solid system (notwithstanding the specifics of the encryption scheme which I don't have the competence to evaluate).

The thresholding is also a reassuring aspect of the system, because (if it works as stated) the device can guarantee that Apple can't see anything at all until a certain number of images match, not even the count of matching images. The threshold could only be changed with an OS update.

There's certainly a lot of things to discuss and criticize about their system, but it's going to be difficult to do so if nearly no one even bothers reading about how it works. It's frustrating.

Aug 18, 2021

That synopsis disagrees with Apple's own descriptions - or rather it goes into the secondary checks, which confuses the issue that the initial hash checks are indeed performed on-device:

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

To quote a sibling comment, who looked into the horses' mouth:

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

Ah, never mind, you're right:

> Apple’s method of detecting known CSAM is designed with user privacy in mind. Instead of scanning images in the cloud, the system performs on-device matching using a database of known CSAM image hashes provided by NCMEC and other child-safety organizations. Apple further transforms this database into an unreadable set of hashes, which is securely stored on users’ devices.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

This seems like an over-reaction. I'm highly opposed to Apple's CSAM move but they are still much better and transparent than Google, Amazon, and most other services.

Many of these already do something like this but they just don't actively tell you or document it.

Also, and please correct me if I am mistaken, Apple's CSAM is limited to iCloud for Photos. It does not just work against your local photos.

  CSAM Detection enables Apple to accurately 
  identify and report iCloud users who store
  known Child Sexual Abuse Material (CSAM) 
  in their iCloud Photos accounts
It seems like a needless waste of time do do all this as opposed to disabling iCloud for Photos...

Source: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

I think they clearly didn't anticipate that people would perceive it as anything but a breach of trust, that their device was working against them (even for a good cause, against the worst people).

And because of this they calibrated their communication completely wrong, focusing on the on device part as being more private. Using the same line of thinking they use for putting Siri on device.

And the follow up was an uncoordinated mess that didn't help either (as you rightly pointed out with Craig's interview). In the Neuenschwander interview [1], he stated this :

> The hash list is built into the operating system, we have one global operating system and don’t have the ability to target updates to individual users and so hash lists will be shared by all users when the system is enabled.

This still has me confused, here's my understanding so far (please feel free to correct me)

- Apple is shipping a neural network trained on the dataset that generates NeuralHashes

- Apple also ships (where ?) a "blinded" (by an eliptic curve algo) table lookup that match (all possible?!) NeuralHashes to a key

- This key is used to encrypt the NeuralHash and the derivative image (that would be used by the manual review) and this bundle is called the voucher

- A final check is done on server using the secret used to generate the elliptic curve to reverse the NeuralHash and check it server side against the known database

- If 30 or more are detected, decrypt all vouchers and send the derivative images to manual review.

I think I'm missing something regarding the blinded table as I don't see what it brings to the table in that scenario, apart from adding a complex key generation for the vouchers. If that table only contained the NeuralHashes of known CSAM images as keys, that would be as good as giving the list to people knowing the model is easily extracted. And if it's not a table lookup but just a cryptographic function, I don't see where the blinded table is coming from in Apple's documentation [2].

Assuming above assumptions are correct, I'm paradoxically feeling a tiny bit better about that system on a technical level (I still think doing anything client side is a very bad precedent), but what a mess did they put themselves into.

Had they done this purely server side (and to be frank there's not much difference, the significant part seems to be done server side) this would have been a complete non-event.

[1] : https://daringfireball.net/linked/2021/08/11/panzarino-neuen...

[2] This is my understanding based on the repository and what's written page 6-7 : https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

I'm not in favor of assuming that everyone's guilty until proven innocent.

But, as a side note...

I get the feeling that a lot of people assume that the CSAM hashes are going to be stored directly on everyone's phone so it's easy to get a hold of them and create images that match those hashes.

That does not seem to be the case. The actual CSAM hashes go through a "blinding" server-side step.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

You have misunderstood. NeuralHash is the visual derivative. Read [1] carefully, it's a very confusing document even for experts - nowhere is there a second step to this process where some second type of "visual derivative" is matched.

The NeuralHash is what matters, solely.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 18, 2021

They look at the contents of the "safety voucher", which contains the neural hash and a "visual derivative" of the original image (but not the original image itself).

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 14, 2021

Regarding 3: it’s very easy to make a mistake in the protocol that would allow apple to detect hashes outside the CSAM list. Without knowing exactly how their protocol works it’s difficult to know whether it is correct.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

For example here is a broken PSI protocol in terms of point 3. I don’t think normally in PSI this is considered broken because the server knows the value so it is part of its private set.

Server computes M_s = g . H(m) . S_s

where g is a generator of an elliptic curve, H(m) is the neural hash of the image and S_s is the server blinding secret.

The client computes M_sc = M_s . S_c where S_c is the client ephemeral secret. This M_sc value is the shared key.

The client also computes M_c = g . H(m) . S_c

and sends the M_c value to the server.

The server can now compute M_cs = M_c . S_s = M_sc since they both used the same H(m) values. This allows the server and client to share a key based on the shared image.

However, what happens if the client does it’s step using the ‘wrong’ image. If 3) is to hold it should not be possible for the server to compute the key.

Client computes:

  M_sc = M_s . S_c

  M_c = g. H(m’) . S_c
The clients final key share is: M_sc = g . H(m) . S_c . S_s

Now server computes: M_cs = M_c . S_s = g . H(m’) . S_c . S_s

The secret shares don’t match. But if the server knows H(m’) it can compute:

M_cs’ = M_cs . inv(H(m’)) . H(m)

and this secret share will match

Normally this client side list in PSI is just used to speed up the protocol so the server does not have to do a crypto operation for every element in its set. It is not a pre-commitment from the server.

Also, maybe the way I’m doing it here is just normally broken because it is not robust against low entropy inputs to the hash function.

I've also reversed some of apple's non-public crypto that is used in some of it's services and they have made dubious design decisions in the past they have created weird weaknesses. Without knowing exactly what they are doing I would not try and infer properties that might not exist or trust their implementation.

Aug 13, 2021

It's all on pages 4 and 5 of https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

> The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash. The system generates NeuralHash in two steps. First, an image is passed into a convolutional neural network to generate an N-dimensional, floating-point descriptor. Second, the descriptor is passed through a hashing scheme to convert the N floating-point numbers to M bits. Here, M is much smaller than the number of bits needed to represent the N floating-point numbers. NeuralHash achieves this level of compression and preserves sufficient information about the image so that matches and lookups on image sets are still successful, and the compression meets the storage and transmission requirements

Just like a human fingerprint is a lower-dimensional representation of all the atoms in your body that's invariant to how old you are or the exact stance you're in when you're fingerprinted... technically Federighi is being accurate about the "exact fingerprint" part. The thing that has me and others concerned isn't necessarily the hash algorithm per se, but rather: how can Apple promise to the world that the data source for "specific known child sexual abuse images" will actually be just that over time?

There are two attacks of note:

(1) a sophisticated actor compromising the hash list handoff from NCMEC to Apple to insert hashes of non-CSAM material, which is something Apple cannot independently verify as it does not have access to the raw images, which at minimum could be a denial-of-service attack causing e.g. journalists' or dissidents' accounts to be frozen temporarily by Apple's systems pending appeal

(2) Apple no longer being able to have a "we don't think we can do this technically due to our encryption" leg to stand on when asked by foreign governments "hey we have a list of hashes, just create a CSAM-like system for us"

That Apple must have considered these possibilities and built this system anyways is a tremendously significant breach of trust.

Aug 13, 2021

Did you go through the technical summary ? They explain it nicely

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 12, 2021

Just read the techical paper. [1]

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 12, 2021

The CSAM detection technical summary [1] only mentions iOS and iPadOS.

If it does come to macOS it will be part of Photos.app, as that's the only way to interact with iCloud Photos. I would recommend you to avoid that app and cloud in general if you care about privacy.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 12, 2021

> Where do you see the bit about visual derivatives?

In Apple's white paper about the proposed feature:

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

See the links at the bottom here for more:

https://www.apple.com/child-safety/

> Also what is this threshold?

The "perceptual hash" is supposed to match a specific image (though possibly cropped, or otherwise altered a bit, such as through a filter), not "toddlers" per se.

> Google doesn't use perceptual hashing, or at least haven't said they do.

I don't know what the other cloud providers are doing, but I'd be very surprised if they use (trivially circumventable) cryptographic hashes.

Aug 12, 2021

We’ll have to wait and see how good their neural hashing is, but just to clarify the 1 trillion number is the “probability of incorrectly flagging a given account” according to Apple’s white paper.

I think some people think that’s the probability of a picture being incorrectly flagged, which would be more concerning given the 1.5 trillion images created in the US.

Source: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 12, 2021

Lots of people responding to this seem to not understand how perceptual hashing / PhotoDNA works. It's true that they're not cryptographic hashes, but the false positive rate is vanishingly small. Apple claims it's 1 in a trillion [1], but suppose that you don't believe them. Google and Facebook and Microsoft are all using PhotoDNA (or equivalent perceptual hashing schemes) right now. Have you heard of some massive issue with false positives?

The fact of the matter is that unless you possess a photo that exists in the NCMEC database, your photos simply will not be flagged to Apple. Photos of your own kids won't trigger it, nude photos of adults won't trigger it; only photos of already known CSAM content will trigger (and that too, Apple requires a specific threshold of matches before a report is triggered).

[1] "The threshold is selected to provide an extremely low (1 in 1 trillion) probability of incorrectly flagging a given account." Page 4 of https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 12, 2021

No they are different things. CSAM detection is looking for specific images from a database before upload to iCloud photo library: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

The child protection part can be enabled for under 13s if they’re in a family account. If enabled messages app will try and detect adult images being sent and received and give a warning to the child, it can also let the parents know about it.

Aug 12, 2021

No they were only scanning iCloud email. They were lagging well behind other services in identifying this content, only finding a few hundred compared to millions by Facebook for example.

The system only works with iCloud Photo Library, it needs a server side component to continue the process.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

This is a very interesting read and personally I think they’ve gone to extreme length to make this system as private as it could be.

Aug 12, 2021

Apple talk about in their technical documentation https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

page 9 : Synthetic Match Vouchers

They generate false positives themselves to hide their knowledge of the true number of collisions.

Aug 11, 2021

It's true that Apple never _sees_ a hash that doesn't match, but the encrypted hash is included in the safety voucher. That is to say, all hashes are uploaded, but only the matches can ever be decrypted, and that's only if there are enough matches.

From the technical summary [0]:

The device creates a cryptographic safety voucher that encodes the match result. It also encrypts the image’s NeuralHash and a visual derivative. This voucher is uploaded to iCloud Photos along with the image.

[0]: https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

Aug 11, 2021

I quickly skimmed [0] and my reading is that scanning is predicated on pictures being uploaded to iCloud. It says things like:

> CSAM Detection enables Apple to accurately identify and report iCloud users who store known Child Sexual Abuse Material (CSAM) in their iCloud Photos accounts.

[0] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...