Pakistan Notice Helper: Small AI for local alerts | Keryc
You get an SMS that says: your account will be suspended, click this link. What do you do? Pakistan Notice Helper is an AI tool built for that exact moment: it doesn't declare a message authentic or fake, but it helps you pause, spot risk signals, and follow safer steps before you click, call or share an OTP.
What is Pakistan Notice Helper
It's a small safety app that accepts text or a screenshot and returns:
a risk label,
a brief explanation,
visible red flags,
and safe next steps to follow.
It's not a definitive verifier. The goal is triage: help you decide whether to investigate further or stop before acting.
Why build something small and local
Why not a giant assistant that does everything? Because the problem is local and specific: suspicious messages in Pakistan written in English, Urdu, Roman Urdu or mixes. A huge model gives quality, but also brings cost, latency and deployment headaches.
The choice here was to prioritize practical experience: speed, reasonable cost, and predictable behavior. That focus let the team build a useful tool in a three-day hackathon, iterating on real test cases.
How it works technically
The tech stack is compact but powerful:
Frontend: Space on Hugging Face with a custom frontend using Gradio.
Backend: Gradio queued server that talks to an endpoint on Modal.
Inference: llama.cpp accelerated with CUDA to serve models in GGUF format.
Selected model: Qwen3.5 4B Q8 MTP GGUF with a vision projector to handle screenshots.
The app handles text and screenshots, applies prompts and a strict "output contract" to prevent the model from making up URLs, numbers or organizations.
Quality vs. cost: choosing the model
The author tested several models:
Qwen3.6 27B: best raw quality (around 95/100 in their tests), but expensive, slow on cold starts and heavy for a demo.
MiniCPM-V 4.6 Q8: promising by size, but unstable and slow to deploy with ZeroGPU.
Qwen3.5 4B: the ideal balance: enough capacity (≈80/100 for the task), cheap, fast and practical for Modal and llama.cpp.
That balance made an agile service possible: typical responses in ~5 seconds and ~9 seconds on cold start.
Product design and output safety
The team was very clear about product limits. Instead of claiming something is 100% real or fake, the app identifies observable signals, for example:
language that threatens or says your account will be suspended urgently;
requests for OTP, PIN, passwords, CVV, CNIC or card data;
suspicious payment links or personal mobile numbers;
impersonation of banks, telcos, delivery, tax authorities or police;
prizes, refunds or job offers that ask for upfront payment.
And it offers practical steps: verify through official channels you find yourself (don’t use links or numbers from the message), contact official support, or ignore and report as appropriate.
They also implemented product-mechanical protections:
system-level ban on prompts inventing URLs or numbers,
token budget tuning for vision-enabled responses,
disabling the 'thinking mode' that consumed the budget before returning structured JSON.
Support for Urdu and UX
Supporting Urdu was not just translating labels. It changed the interface:
RTL layout (right-to-left),
translation of headers and controls,
instructions for the model to generate the evaluation in clear Urdu script.
Details like fonts (Nastaliq vs. the system Arabic stack), line heights and ordering of mixed elements (Urdu + Latin names) were necessary tweaks so the app feels trustworthy and readable.
Testing, traceability and privacy
The project included a small regression suite for quality testing. Key results from the author:
Measure
Result
Initial strict steps
9 out of 10
Initial average score
89.5/100
Final regression steps
10 out of 10
Final average score
100/100
High risk cases
All passed
Cases with screenshots
Both passed
They added an optional public trace that publishes only limited metadata (counters, booleans, fixed summaries). Important: the trace DOES NOT store screenshots or full texts; inputs are redacted and capped. The inference endpoint remains private and runs on Modal, so the app warns users not to send sensitive data.
Practical lessons and next technical steps
What did the project teach about building with small models?
When the problem is well-scoped, a 4B can be surprisingly effective.
Safety comes more from a clear output contract, validated prompts and UX that forces doubt than from just increasing parameters.
Cost, latency and cold starts are decisive factors in real experience.
For the next version, the author proposes an agentic verification flow: a controlled agent (planned with Olostep and OpenAI Agents SDK) that searches public warnings, scrapes independent sources and shows evidence separate from the inference. That flow must keep strict limits (never trust message data) and balance the extra delay (up to ~30 seconds) against the benefit of deep verification.
Final reflection
Pakistan Notice Helper doesn't pretend to be the final word against fraud. It's a small, practical light: it helps people think twice, see clear signals and act more safely. For local, concrete problems, small, well-designed AI with honest limits can make a real difference.