top 1 million by prevalence
I was thinking that you could help us host the database by offering top X by prevalence. One could then host the database within the enterprise. My simple test showed
470K Mar 4 04:32 10K.txt
45M Mar 4 04:31 1M.txt
9.0M Mar 4 04:46 200K.txt
16M Mar 4 04:48 345K.txt
18G Jan 21 05:42 pwned-passwords-sha1-ordered-by-count-v8.7z
The interesting part is that the prevalence dropped to below 500 at 1M records. SQLite was able to load this into a 155M database which we can easily host ourself.
As you've subsequently said, this isn't going to happen. The API is there to solve precisely the sorts of problems you're trying to address by requesting everything offline.
-
Baa commented
Opps.
I just fount out that it's not going to happen according to https://www.troyhunt.com/open-source-pwned-passwords-with-fbi-feed-and-225m-new-nca-passwords-is-now-live/.
I still think it is a good idea. It will save us the cost of the API for my small group where our server is on Google free tier.