San Francisco's public records systems are carrying thousands of duplicate images — redundant scans, re-uploaded permit photos, and mirrored document attachments — and city technology officials are facing growing pressure to do something about it. The problem spans at least a half-dozen municipal databases, according to city technology staff briefings reviewed this spring, and it is slowing down public-records requests, inflating storage costs, and complicating the kind of AI-assisted document processing that agencies are now trying to roll out.
The issue has sharpened in 2026 because several departments, including the San Francisco Planning Department on Mission Street and the Department of Building Inspection on Duboce Avenue, are mid-migration to cloud-based records systems. Duplicate files don't just waste space — they create version-control problems that can surface in legal proceedings, slow down permit approvals, and muddy the public-facing portals that residents use to track neighborhood projects.
What Officials and Technologists Are Saying
Staff at the San Francisco Department of Technology, which oversees the city's DataSF platform, have flagged the duplication problem as a prerequisite to any serious AI deployment in city workflows. Without deduplication, automated classification tools produce unreliable results — a photograph of a Mission District building facade uploaded six times under six slightly different file names will confuse any machine-learning model trying to categorize permit records. City technologists have been pushing for a standardized image-hashing protocol, a technical process that fingerprints each file so duplicates can be identified and removed without manually reviewing every record.
Open-government advocates at the San Francisco chapter of OpenGov Foundation have argued the stakes extend beyond internal efficiency. Residents filing California Public Records Act requests through the city's NextRequest portal sometimes receive document packages bloated with redundant attachments, making it harder to find the original filing. That is not a minor inconvenience — under state law, agencies are supposed to provide records in a reasonably usable form, and advocates contend that duplicate-laden exports may not meet that bar.
The San Francisco Municipal Transportation Agency, which manages billions of dollars in infrastructure and produces high volumes of photographic evidence for insurance claims and construction inspections, is among the departments most directly affected. SFMTA's internal document management system, which staff transitioned to a new platform in late 2024, is estimated to hold a significant share of redundant image files generated during the data migration — a pattern common to large system changeovers where files are bulk-uploaded without deduplication checks.
What Needs to Happen — and When
The city's Digital Services team set an internal target of the third quarter of 2026 to complete a full audit of image records across the five largest municipal databases. That timeline is ambitious. DataSF's public asset inventory currently lists more than 140 active city datasets, and the image-heavy ones — building permits, traffic incident reports, public-works inspections — are among the most complex to clean.
Storage costs are a real factor. Cloud storage for government data isn't free, and city budget analysts have noted that uncontrolled file growth adds incremental but compounding expense to IT operating budgets that are already under scrutiny during the current fiscal cycle. The Board of Supervisors' Budget and Finance Committee reviewed the Department of Technology's fiscal year 2026-27 allocation this spring as part of the broader city budget process, and storage optimization was listed among expected efficiency measures.
Practically speaking, the deduplication push matters most to two groups: residents trying to track development projects in neighborhoods like Hayes Valley or the Outer Sunset, and contractors who submit permit applications through the city's online Permit Center on Polk Street. Both groups have complained, in public comment sessions and in feedback logged through the city's 311 system, that the document retrieval process is slower and more confusing than it should be.
City technologists say a phased rollout of automated deduplication tools — starting with the Planning Department's image archive — could be completed by the end of 2026 if budget allocations hold. The practical test will come with the next major wave of permit applications tied to the city's housing production emergency, where clean, fast-moving records systems are not a luxury but a requirement.