San Francisco's Department of Technology is sitting on a problem that took roughly a decade to create: tens of thousands of duplicate digital images scattered across city databases, the residue of overlapping digitization drives that nobody coordinated. The effort to identify and replace those duplicates is now entering its most expensive phase, with contracts under review and department heads being asked to account for storage costs that have ballooned well beyond original projections.
The stakes are higher than they might appear on a holiday weekend. City agencies from the Planning Department on South Van Ness Avenue to the Office of the City Assessor-Recorder rely on image archives to process permits, verify property records, and document code enforcement actions. A duplicate image in the wrong field doesn't just waste server space — it can stall a housing permit application or muddle a chain of title at a moment when San Francisco's housing production emergency leaves no room for bureaucratic delay.
How the Duplication Crisis Accumulated
The roots of the problem stretch back to at least 2014, when multiple city departments launched separate digitization initiatives without a shared file-naming standard or a central deduplication protocol. The San Francisco Public Library's digitization program, based at the main branch on Larkin Street, ran independently of records projects at the Department of Building Inspection on Duboce Avenue. The City Attorney's Office maintained its own document imaging system. Each used different vendor software, different metadata schemas, and — critically — different rules about what counted as an authoritative copy of a scanned file.
When the city later attempted to consolidate those siloed repositories into the DataSF open-data platform and connected internal systems, automated ingestion scripts pulled files from multiple sources without checking whether a given image already existed. The result was a cascade of redundancies. A single photograph of a Tenderloin building facade, for instance, might appear under three different file identifiers across two separate department databases, each tagged with slightly different metadata and none flagged as a copy.
Tech sector shifts compounded the administrative inertia. Between 2022 and 2024, a wave of layoffs across companies headquartered in SoMa and the Financial District shrank the pool of contract data engineers the city had drawn on for short-term cleanup projects. By the time the AI hiring surge began reversing some of those workforce losses in 2025, the backlog in city IT had grown substantially.
The Cleanup, and What Comes Next
The Department of Technology issued a request for proposals in March 2026 seeking vendors capable of running automated duplicate-detection algorithms across an estimated 4.2 million image files stored in city systems — a figure the department disclosed in internal budget documents reviewed as part of the city's standard procurement process. Storage costs for redundant files have reportedly added a measurable line item to departmental IT budgets, though the Department of Technology has not yet published a finalized cost figure for the current fiscal year.
The San Francisco Planning Department, which processes thousands of permit applications annually for projects from the Sunset District to Dogpatch, has been identified internally as one of the agencies most affected by the duplicate-image backlog. Planning staff must manually reconcile image records in some cases where automated systems surface conflicting file versions, adding time to reviews at exactly the moment the city is trying to accelerate housing approvals under state mandates tied to its Housing Element update.
For residents and developers dealing with city permitting, the practical advice is straightforward: when submitting digital documentation to any San Francisco city agency, use descriptive, unique file names that include the parcel address and submission date, and confirm with the receiving department which file format and resolution standards it prefers. Files that don't match a department's ingestion requirements are more likely to be re-scanned or re-uploaded by staff, increasing the chance of duplication downstream.
The Department of Technology expects to award a deduplication contract before the end of the third quarter of fiscal year 2026-27. Whether that timeline holds depends partly on how quickly the Board of Supervisors' budget committee moves on related IT appropriations when it reconvenes after the Independence Day recess.