San Francisco's municipal agencies are sitting on millions of duplicate digital images — redundant files that slow database queries, inflate storage costs, and increasingly trip up the AI-assisted tools the city has been deploying since late 2024. The problem isn't new, but pressure to fix it is mounting as departments push deeper into automated permitting, housing inspection tracking, and transit monitoring.
The timing matters. Mayor Daniel Lurie's administration has made housing production a top priority, and the Planning Department's permit-processing pipeline depends heavily on document management systems that store site photographs, architectural drawings, and inspection records. When the same image appears under four different file names — a common occurrence after years of ad hoc scanning practices — automated workflows flag conflicts, slow review times, and, in some cases, force staff to intervene manually.
Where the Bottlenecks Show Up
Two agencies come up repeatedly in conversations with city technology staff: the San Francisco Municipal Transportation Agency, which manages tens of thousands of images tied to parking enforcement, street-condition reports, and construction-zone permits; and the Department of Building Inspection, headquartered on Fell Street, whose Accela permitting platform has accumulated years of duplicated photo attachments linked to projects in neighborhoods from the Tenderloin to the Outer Sunset.
The San Francisco Digital Services office, a unit within the City Administrator's office that was restructured in 2023, has been piloting deduplication software on a subset of Planning Department records held at 49 South Van Ness Avenue. The pilot, which began running on roughly 400,000 image files in the spring of 2025, is evaluating whether perceptual hashing tools — algorithms that compare images by visual content rather than file name — can flag redundant records without human review of each one.
Archivists and records managers watching the pilot say the core challenge is that city departments built their digital storage systems independently, using different naming conventions, different scanner settings, and different metadata standards. An image of a Mission District façade might exist as a TIFF in one system and a compressed JPEG in another, and a simple file-name match won't catch that duplication. Perceptual hashing can, but it generates its own false positives — flagging similar-but-distinct images of, say, two storefronts on Valencia Street as duplicates when they are not.
The cost argument for cleaning up the archives is straightforward. Cloud storage for large unstructured file collections is not cheap, and city contracts for enterprise document management have grown. According to the City Controller's Office FY2025 budget summary, the city's information technology expenditures across all departments exceeded $300 million, a figure that includes storage infrastructure. Records managers argue that eliminating verified duplicates could reduce storage overhead meaningfully — though without completed audits, precise savings projections remain speculative.
What Comes Next for Departments
Digital Services staff have indicated the pilot results are expected to inform a broader policy recommendation sometime before the end of calendar year 2026. If the pilot performs well, the recommendation would likely call for a citywide metadata standard for image attachments — essentially a common set of rules for how departments name, tag, and store photographs when they enter any city system.
For residents and contractors dealing with the city's permitting systems, the practical effect of a successful deduplication push would be faster document retrieval. Builders applying for permits in high-density corridors like Geary Boulevard or in the Dogpatch redevelopment zone have complained for years about delays traced partly to database lookup times inflated by redundant records.
Technology consultants familiar with municipal records work caution that software alone won't solve the problem. Agencies need updated intake protocols so that new duplicates don't accumulate as fast as old ones are removed. Training for the staff who scan and upload documents — often entry-level administrative workers across dozens of departments — is a necessary complement to any automated deduplication tool.
The Fourth of July holiday gives city offices a one-day pause. When they reopen Saturday, the Digital Services pilot will still be running, the Accela backlog will still be there, and the debate over who owns the fix — individual department IT units or a centralized city authority — will pick up exactly where it left off.