The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Full of Duplicate Images — And Why That's Finally Changing

Decades of siloed city databases, departmental turf wars, and rushed digitisation projects left San Francisco's public records systems bloated with redundant files, costing taxpayers money and slowing access to critical information.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

How San Francisco's Digital Archives Ended Up Full of Duplicate Images — And Why That's Finally Changing
Photo: Photo by GuiGo Lopes on Pexels

San Francisco's municipal digital infrastructure is carrying a hidden weight. Across city departments — from the Planning Department's permit-image database on South Van Ness Avenue to the Recreation and Parks Department's asset-management system maintained out of McLaren Lodge in the Panhandle — tens of thousands of duplicate image files have accumulated over more than two decades of piecemeal digitisation efforts, creating storage costs, search failures, and administrative bottlenecks that still slow workers down today.

The problem didn't arrive overnight. It is the direct product of how the city chose — and repeatedly failed — to modernise its record-keeping infrastructure between roughly 2002 and 2022, a period defined by departmental autonomy, incompatible software procurement, and a chronic absence of citywide data governance standards.

The Digitisation Rush That Left a Mess Behind

The pressure to digitise municipal records accelerated after 2003, when a California state mandate pushed local governments to make permitting and inspection records more accessible to the public. San Francisco responded, but each department largely built its own solution. The Department of Building Inspection adopted one content-management platform; the Office of the Assessor-Recorder purchased a separate system; the San Francisco Public Library's digital collections team used a third. None of these systems spoke to one another in any standardised way.

When departments merged functions or staff transferred between offices, image files — building permit photos, property survey scans, event documentation from Civic Center and Yerba Buena Gardens — were routinely re-uploaded rather than linked or referenced. Naming conventions varied wildly. A single photograph of a Mission District storefront facade might exist under four different filenames across three different departmental servers, each copy taking up storage and each appearing as a distinct record in public-facing search tools.

The situation compounded after 2016, when San Francisco launched DataSF, the city's open-data initiative managed through the City Administrator's Office. DataSF improved public-facing transparency substantially, but it inherited the underlying mess. Analysts working with the platform flagged the duplicate-image problem in internal reviews, though those findings did not produce a coordinated remediation effort at the time.

AI Enters a Crowded, Cluttered Room

The arrival of AI-powered image-recognition tools in city government — piloted in limited form by the San Francisco Department of Technology beginning around 2023 — gave administrators the first practical means of detecting and flagging duplicate files at scale. Earlier deduplication software relied on exact file-hash matching, meaning a file resaved at a slightly different resolution or with altered metadata would evade detection entirely. The newer tools can identify visually identical or near-identical images regardless of filename or format, a capability that changes the remediation calculus completely.

City technology procurement data shows San Francisco spent approximately $4.2 million on cloud storage contracts for departmental document systems in fiscal year 2024-25, a figure that budget analysts in the Controller's Office have noted could be meaningfully reduced through active deduplication. For context, the city's overall Department of Technology operating budget for that same period ran to roughly $120 million, making storage optimisation a relatively modest but achievable savings target.

The timing also matters for a city still under political pressure over its handling of basic services. Mayor Daniel Lurie, who took office in January 2025 after defeating London Breed, has made operational efficiency a stated priority, and the Department of Technology has been tasked with producing a citywide data governance framework by the end of calendar year 2026. Duplicate-image remediation is expected to be one component of that framework.

For San Franciscans trying to pull property records through the Assessor-Recorder's online portal on City Hall's ground floor, or research building permits for a renovation in the Outer Sunset, the practical result of progress on this issue will be simpler: cleaner search results, faster retrieval, and fewer instances of conflicting records appearing for the same address. The Department of Technology has not announced a public-facing rollout timeline, but the framework due date at the end of 2026 means the architecture decisions shaping that experience are being made right now.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.