Allow users to search for an email address by hash rather than sending the email to the API in cleartext.
Under the suspicion that submitted email addresses are being harvested, a privacy conscious user could feel safer checking for the presence of their email in the database by submitting a hash of it rather than the email address itself. I, for instance, have two email addresses: one which everyone knows, and one which very few people know. I'm very curious about the latter, but there's no way I'd enter it into any web form.
I’m closing this out as “declined” for several reasons:
1. Now with almost 5B records, there’s a very high chance I have the hash being searched already and if I have that, I know the plain text.
2. It would lead to massive redundancy in the system, literally doubling the volume of data I store
3. It would be very rarely used; the vast majority of requests come via the web app from consumers browsing to the site and yes, I could hash on the client, but then you have to trust HIBP is reliably doing that which bring me to the final point…
4. …I would advise against sending an address to any service you don’t trust, regardless of the lengths I go to in ensuring searches aren’t recorded
So in summary, a combination of high effort and low reward.
-
Hi Eddie, I talk about that in this blog post: https://www.troyhunt.com/were-baking-have-i-been-pwned-into-firefox-and-1password/
In short, I'm a step closer to being able to do it but it's still non-trivial and still won't provide much protection from me being able to work out the address.
-
Eddie J Carswell II commented
Now that you do have a separate table of email hashes for Firefox and 1Password, do you plan to revisit this request?
-
Coincidentally, I was thinking about this just today. There are two ways I look at this:
1) The hash would always be resolvable to the plain text email anyway. Obviously at present I store email in the clear, if someone searched for a hash and I really wanted to know what the underlying email was then I'd simply enumerate through the clear text versions and hash until I get a match.
2) Providing hashes would only allow me to resolve clear text versions of emails which have already appeared in breaches or pasts. If I don't already have the data, by providing a hash nobody would be seeding HIBP with new material. It might sufficiently raise the privacy bar to the point where people who wouldn't otherwise use the service now trust it.
One thing I keep coming back to though (and I really must write something on this), is that email addresses are not a class of data in the same privacy realms of, say, names, addresses or credentials. 99.x% of systems store them in the clear, they fly around the web without encryption and the vast majority of people provide them to most people who ask without reservation. There are edge cases (such as you've mentioned), but for the most part, it's a data class that's meant to be shared (albeit not recklessly disclosed en mass).
Having said all that, there may be a use case for a feature I presently have in private beta. Let's keep this suggestion open and we'll see if it gets any traction either from individuals or to support the aforementioned feature.