Preventing weak passwords at registration with privacy-preserving API
Reddit Community
Community Problem
Elevator Pitch
Builds a serverless API using mmap and Bloom filters to check passwords against leaked dictionaries without exposing user data, enhancing security at registration.
Full Description
Hey everyone,
I was recently evaluating some Identity Threat Protection tools for my org and realized something frustrating: users are still creating new accounts with passwords like password123 right now, in 2026. Instead of waiting for these accounts to get breached, I wanted to stop them at the registration page.
So, I built an open-source API that checks passwords against CrackStation’s 64-million human-only leaked password dictionary and others.
The catch? You can't just send plain text passwords to an API.
To solve this, I used k-anonymity (similar to how HaveIBeenPwned handles it):
- •The client SDK (browser/app) computes a SHA-256 hash locally.
- •It sends only the first 5 hex characters (the prefix) to the API.
- •The API looks up all hashes starting with that prefix and returns their suffixes (~60 candidates).
- •The client compares its suffix locally.
The API, the logs, and the network never see the password.
The Engineering / Infrastructure
I'm a DevOps engineer by trade, so I wanted to make the architecture serverless, ridiculously cheap, and secure by design:
- •Compute: AWS Lambda (Docker, arm64) + FastAPI behind an Edge-optimized API Gateway + CloudFront (Strict TLS 1.3 & SNI enforcement).
- •The Dictionary Problem: You can't load 64 million strings into a Python dict in Lambda. I solved this by building a pipeline that creates a 1.95 GB memory-mapped binary index, an 8 MB offset table, and a 73 MB Bloom filter. Sub-millisecond lookups without blowing up Lambda memory.
- •IaC: The whole stack is provisioned via Terraform with S3 native state locking.
- •AI Metadata: Optionally, it extracts structural metadata locally (length, char classes, entropy) and sends only the metadata to OpenAI for nuanced contextual analysis (e.g., "high entropy, but uses common patterns").
I'd love your feedback / code roasts:
While I can absolutely vouch for the AWS architecture, IAM least-privilege, and Terraform configs, the Python application code and Bloom filter implementation were heavily AI-assisted ("vibe-coded").
If there are any AppSec engineers or Python backend devs here, I’d genuinely welcome your code reviews, PRs, or pointing out edge cases I missed.
- •GitHub Repo (Code, SDKs, & local Docker setup): https://github.com/dcgmechanics/is-your-password-weak
- •Architecture Deep Dive: https://medium.com/@dcgmechanics/your-users-are-still-using-password123-in-2026-here-s-how-i-built-an-api-to-stop-them-d98c2a13c716
Happy to answer any questions about the infrastructure or the k-anonymity flow!
Get involved
Discussion
No comments yet. Be the first to share your thoughts.
From the Reddit thread(10 top comments)
- 115·Reddit commenter·1mo ago
1. Implement password complexity rules. 2. Remove passwords that don't match password complexity rules from the password list. 3. Profit. You have just reduced the password list significantly, therefore are able to run cheaper. Because you're just using this internally, it doesn't matter to you whether non-complex passwords can't be used through this API because you have control of the applications and can set password complexity rules.
permalink ↗ - 86·Reddit commenter·1mo ago·reply
Bro vibe-coded a super complex solution to something that could be done with a regex.
permalink ↗ - 63·Reddit commenter·1mo ago
Looks overengineered. Couldn't you just enforce password policies and maybe MFA?
permalink ↗ - 43·Reddit commenter·1mo ago
so your user has to a) choose a password that fits the complexity and then b) hope it isn't in the leak list? am i reading this right?
permalink ↗ - 27·Reddit commenter·1mo ago
You used a modern LLM to generate a string of words barely better than [2010s-era technobabble generators.](https://web.archive.org/web/20130812033256/http://shinytoylabs.com/jargon/#) Impressive, really.
permalink ↗ - 24·Reddit commenter·1mo ago
I mean that's a cool project, but is it more than a tinkering project? 64 million passwords are basically irrelevant; sure it catches really bad passwords, but most of them could just be filtered out by having sensible password requirements. Also has this not been solved already by haveibeenpwned? They have an API and if I'm reading their docs correctly they basically have this functionality with billions of leaked passwords instead of 64 million.
permalink ↗ - 17·Reddit commenter·1mo ago·reply
Why use 'x' when you can spend 'y' amount of time to develop a completely original solution that someone will fail to understand and break. Job security!
permalink ↗ - 11·Reddit commenter·1mo ago
Too busy focused on if they could and never stopped to ask if they should
permalink ↗ - 11·Reddit commenter·1mo ago·reply
And when they inevitably pick "hunter2" I hope it fails silently.
permalink ↗ - 7·Reddit commenter·1mo ago
This can't be real. Claude will output a one line regex to replace all of that :D
permalink ↗