Your use of the word partial could lead to confusion. The Diagnosis Keys are a subset of the Daily Tracing Keys for the days youre contagious. You then upload these Daily Tracing Keys and associated day numbers. Also you are incorrect about the involvement of a timestamp. The protocol uses DayNumbers to track the specific day a Daily Tracing Number was used.
In terms of privacy, small-scale adversaries can deanonymize infected users that have uploaded their [keys by keeping logs] of and limiting who they have come in close contact with.
On a large-scale, adversaries in control of large Bluetooth receiver networks (such as cities performing traffic analysis) can now track the movements of individual infected users over the course of a day. One could argue that this is already being done to track anyone with bluetooth enabled.
In addition, the process of uploading to the backend server could alert adversaries monitoring your network that you (the device using your IP address, uploading to the server IP address) have tested positive for the virus.
I recommend that you look at other contact tracing protocols that circumvent some of these issues by decrease or eliminating the linkability of identifiers, allowing users to censor records uploaded, and encourge the use of network-anonymization.
*Edit Spelling - Source: https://covid19-static.cdn-apple.com/applications/covid19/cu...
Is there an official document somewhere?
Edit: Apple's preliminary specification was linked in another HN comment. (https://covid19-static.cdn-apple.com/applications/covid19/cu...)
The main issue of bloom filters is this:
> only needs to download the actual keys if there's a potential match.
One of the design constraints of the service was that it should not know your (suspected) infection status unless you give consent that it should be shared.
> Matches must stay local to the device and not be revealed to the Diagnosis Server.
The better the bloom filter is, the more likely it is that you have actually been in contact with a key if the bloom filter is positive.
Furthermore, the bloom filter has to deal with a lot more keys. In fact, in your example of 1000 positives per day uploading 14 days of keys you only need to upload 14 keys as they only rotate once per day. At 16 bytes per key (as the link above specifies), you'd have to download 14 * 1000 * 16 = 224kb, much less than the bloom filter needs. And this scheme can tell you with 100% certainty whether there has been a match or not, so at least in your example it's much better than bloom filters.
The scalability issues that exist only manifest themselves at larger numbers than 1000 infections per day, say upper tens to lower hundreds of thousands where it starts becoming a problem.
So yes, rough location as moxie suggests is the best method to improve the scheme. Instead of checking the IDs of people thousands or hundreds of km away from you, you could just check the IDs of people in your US state or county. But it has to be smart enough to recognize movement, as in, you need to upload/download all areas you've been in and people living at the borders automatically stand out because they download two or three areas.
In widely distributed and important spec like this it may be useful to look for what is conspicuously absent or unstated, rather than simply reading the precise positive language.
To my mind this phrase under 'Privacy Considerations' in the Cryptography Specification stands out:
"A server operator implementing this protocol does not learn who users have been in proximity with or users’ location unless it also has the unlikely capability to scan advertisements from users who recently reported Diagnosis Keys."
That phrase explicitly mentions that server operators cannot learn about user proximities.
What I reckon may be unstated there is that it could be possible for adversaries with sidechannel / network monitoring capability to learn those kind of details about users (i.e. internet, cell data, and other data network operators).
If such a side door did exist, it would seem in the public interest to be aware of the scope of the availability of that data, especially given the potential (physical, social) vulnerability and risk of those users.
I'd also like to be proven wrong about the possibility of such sidechannel attacks by anyone who understands the spec in more detail.
Have you read the spec (or even only the crypto sub spec ) before making your comment?
I mistakenly gave you the impression that I was linking to the spec. I was in fact linking to the infomercial that had a summary of the privacy considerations. The actual spec can be found here:
2. Cryptography: https://covid19-static.cdn-apple.com/applications/covid19/cu...
A technical outline is here: https://covid19-static.cdn-apple.com/applications/covid19/cu... also linked elsewhere in this thread.
> Upon a positive test of a user for COVID-19, their Diagnosis Keys and associated DayNumbers are uploaded to the Diagnosis Server. A Diagnosis Server is a server that aggregates the Diagnosis Keys from the users who tested positive and distributes them to all the user clients who are using contact tracing.
Is this scalable? Earlier in the document they mentioned that the tracing keys are 16 bytes long. Let's assume that there are 3 million patients in a country. That'd be 48 megabytes each user has to download and process per day to check whether they've been in contact with an infected person (processing involves calculation of 144 HMACs per tracing key). I don't think this is feasible at scale and one can't avoid thinking about area recognizing diagnosis servers.
E.g. Smartphones of patients would upload not just the diagnosis keys, but also the areas (county, district, something like that) they've been inside during that day. Then smartphones querying the diagnosis servers would have to send the areas they are interested in. But it's easy to see that this approach is then quite privacy invading. On the bright side, this info is already available to carriers so it's already a sunken cost so to speak.
The whole point of this is to not enable these abuses, see https://covid19-static.cdn-apple.com/applications/covid19/cu...
The nefarious ad actor can do far more with the existing stack.
The part about how it works. Once you've read that (here's a link: https://covid19-static.cdn-apple.com/applications/covid19/cu...) Can you describe to me how you'd track individuals with it?
PDF of spec draft: https://covid19-static.cdn-apple.com/applications/covid19/cu...
The relevant privacy details:
• The key schedule is fixed and defined by operating system components, preventing applications from including static or predictable information that could be used for tracking.
• A user’s Rolling Proximity Identifiers cannot be correlated without having the Daily Tracing Key. This reduces the risk of privacy loss from advertising them.
• A server operator implementing this protocol does not learn who users have been in proximity with or users’ location unless it also has the unlikely capability to scan advertisements from users who recently reported Diagnosis Keys.
• Without the release of the Daily Tracing Keys, it is not computationally feasible for an attacker to find a collision on a Rolling Proximity Identifier. This prevents a wide-range of replay and impersonation attacks.
• When reporting Diagnosis Keys, the correlation of Rolling Proximity Identifiers by others is limited to 24h periods due to the use of Daily Tracing Keys. The server must not retain metadata from clients uploading Diagnosis Keys after including them into the aggregated list of Diagnosis Keys per day."
It doesn't look bad, at least, at the first sight.
A detail: I hope the "day begin" for the "Daily Tracing Key" is the same for all users? I.e. not a local day but e.g. GMT+0 day or something.
Further deep links to the technical side:
https://covid19-static.cdn-apple.com/applications/covid19/cu... Cryptographic Specification