San Francisco's city agencies collectively maintain millions of digital images across dozens of databases, from building permit photos filed with the Department of Building Inspection on Van Ness Avenue to surveillance stills archived by the Municipal Transportation Agency. And according to technologists and government records managers who work with those systems, a stubborn, unglamorous problem keeps compounding: duplicate images — sometimes hundreds of copies of the same file — are eating storage, slowing retrieval, and making public records searches harder than they need to be.
The issue has gained quiet urgency in 2026, as San Francisco's AI-driven tech resurgence has raised expectations for city digital infrastructure to keep pace. The Planning Department alone processed more than 14,000 permit applications in 2025, each potentially carrying multiple attached photographs. When project files migrate between legacy systems, duplicates multiply. Records managers say the scale has grown faster than existing tools can address.
What Officials and Experts Are Saying
City officials have been careful in their public characterizations. The San Francisco Office of Digital Services, based in City Hall's Civic Center complex, has acknowledged in public budget presentations that storage redundancy is a recognized cost driver, though the office has not released a specific dollar figure tied to duplicate imagery alone. The department's fiscal year 2026 technology modernization budget — approved by the Board of Supervisors in June — includes a line item for database deduplication tools, though the precise allocation has not been made public.
Archivists and information-science professionals affiliated with the San Francisco Public Library's San Francisco History Center on Larkin Street have described the problem in technical terms that civic planners would recognize: when image files lack standardized metadata, automated systems cannot reliably detect that two JPEG files are the same photograph taken at the same Mission District intersection. Without reliable detection, deletion becomes a manual and error-prone task.
The private sector counterpart to this conversation is playing out in SoMa's dense corridor of AI and data-infrastructure startups. Several companies working on document-management contracts with California municipalities have described duplicate image replacement — the process of identifying a master copy and systematically removing or redirecting all redundant versions — as one of the more technically demanding problems in government data hygiene. The challenge is not just detection but replacement: ensuring that every internal link pointing to a duplicate now resolves correctly to the authoritative file.
Why the Fourth of July Deadline Matters
The timing is not coincidental. A California state mandate, stemming from the Government Code updates passed in Sacramento in early 2025, requires all county and municipal agencies to certify digital records compliance by September 30, 2026. San Francisco city departments received formal guidance from the California State Archives in March advising that duplicate and orphaned digital assets must be inventoried before that certification can be filed. For a city already managing a projected $800 million structural deficit over the next two fiscal years, the administrative cost of non-compliance — which could include withholding of certain state technology grants — is a pressure officials would rather avoid.
BART's records management team at the Lake Merritt administrative offices has separately flagged that its inspection photography archive, which dates to the early 2000s and includes tunnel and station condition images, contains an estimated 30 to 40 percent redundancy based on an internal 2024 audit. The agency has not publicly committed to a deduplication deadline tied to the state mandate, since BART operates as a regional district rather than a city department, but its technology staff have described the audit findings in presentations to the BART Board of Directors.
For residents and advocates who use public records to track city infrastructure — from Tenderloin sidewalk repair backlogs to building inspections in the Outer Sunset — the practical advice from information professionals is consistent: when filing a California Public Records Act request with a city department, specify that you want the authoritative or master version of any image file, and ask whether the department has a deduplication policy in place. That single question, records managers say, often prompts agencies to confirm whether their systems can actually deliver a clean, non-redundant response — and increasingly, that answer is complicated.