The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Public Records Became a Graveyard of Duplicate Images — and Why City Hall Is Finally Paying to Fix It

Years of siloed departmental databases, rushed digitization contracts, and deferred IT maintenance left the city's document systems riddled with redundant files that cost money and erode public trust in open-data promises.

By San Francisco News Desk · Published 4 July 2026, 11:45 am

3 min read

San Francisco's Department of Technology is sitting on a problem that took roughly a decade to build: tens of thousands of duplicate images embedded inside public-facing databases, permit portals, and archival record systems that city agencies rely on every day. The cleanup bill, according to a budget line item reviewed during the Board of Supervisors' fiscal year 2025–26 hearings, runs into the low seven figures when staff time, cloud storage overruns, and third-party audit fees are bundled together.

The issue matters right now because the city is in the middle of its most aggressive push toward digital permitting since the Newsom administration launched SF Digital Services in 2019. Housing production is the political priority of the moment — Mayor Daniel Lurie's office has pledged to streamline the Department of Building Inspection's approval pipeline after years of complaints from developers trying to build on sites from the Tenderloin to Bayview-Hunters Point. Duplicate images clogging the permit record system slow that pipeline. Every redundant scan of a site plan or inspection photo that a staffer has to manually reconcile is time not spent processing a new application.

A Problem Born in the Digitization Rush

The roots of the duplication mess stretch back to roughly 2014 and 2015, when multiple San Francisco agencies began independently scanning legacy paper records without a shared file-management standard. The Department of Public Works ran its own digitization contract. The Planning Department ran another. The San Francisco Public Library's San Francisco History Center, located on Larkin Street, digitized thousands of archival photographs under a separate grant-funded initiative. None of the resulting repositories were built to talk to each other.

When the city later tried to consolidate assets onto a unified cloud platform — a project managed out of the Department of Technology's offices on Seventh Street — automated ingestion tools pulled image files from each legacy system. Identical scans migrated multiple times under different metadata tags. A single street-level photograph of a Mission District building could appear four or five times across different departmental buckets, each instance eating storage and each instance requiring a separate retrieval path for a public records request.

The San Francisco Controller's Office noted in its fiscal year 2023–24 City Services Auditor report that data management inefficiencies across multiple departments were contributing to above-benchmark costs in cloud infrastructure. The report did not assign a single dollar figure specifically to duplicate images, but storage expenditures for city document systems had grown substantially over the prior three fiscal years, a pattern the auditors flagged for follow-up.

What the Cleanup Actually Involves

Fixing the problem is less romantic than the politics around it. The Department of Technology has been piloting a deduplication protocol since early 2026 using hash-matching software — a standard commercial tool that compares image files at the binary level and flags pairs that are identical regardless of what filename or metadata tag they carry. The pilot ran first against the Bureau of Urban Forestry's tree-permit image library, a contained dataset small enough to validate the approach without risking disruption to higher-stakes systems.

The SF Digital Services team, which operates out of City Hall and coordinates with the Mayor's Office of Civic Innovation, is expected to expand the protocol to the Department of Building Inspection's permit image repository before the end of calendar year 2026. That repository is the one most directly tied to the housing-production bottleneck Lurie has made a signature issue.

For residents and developers, the practical payoff would be faster document retrieval on SF Planning's public-facing Permit Tracking portal and reduced error rates when contractors pull historical inspection images to prepare renovation filings. For the city budget, successfully deduplicated storage could trim recurring cloud costs — though any savings would need to be weighed against the one-time audit and remediation expense.

The Board of Supervisors' Government Audit and Oversight Committee is scheduled to receive a progress briefing on the Department of Technology's data infrastructure modernization effort later this summer. That hearing will be the first formal public accounting of how far the deduplication work has advanced and what, concretely, it has cost to unwind years of fragmented record-keeping.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.