San Francisco's public agencies and cultural institutions are staring down a backlog of duplicate digital imagery that has quietly ballooned over the past decade, and the decisions made in the next 12 months will determine how the city's visual record gets preserved, pruned, and made accessible to the public.
The issue has moved from a back-office data management headache to a genuine policy question. As the city's Department of Technology accelerates its cloud migration — moving legacy servers housed at 1 South Van Ness Avenue to off-site infrastructure — administrators are being forced to confront storage systems riddled with redundant image files. The same photograph, in some cases, exists in dozens of slightly different versions: different resolutions, different metadata tags, different file formats accumulated across years of departmental uploads with no unified standard.
Why now? The city's ongoing push to digitize public records, combined with a 2025 Board of Supervisors directive requiring greater open-data compliance from all city departments by the end of fiscal year 2026, has put deduplication squarely on the agenda. Storing redundant files isn't just an aesthetic problem — it's an ongoing cost, and one that falls on taxpayers.
What's at Stake for SF's Institutions
The San Francisco Public Library's digital collections program, based at the main branch on Larkin Street in Civic Center, is among the institutions wrestling most visibly with this. The library has spent years building out its digitized photograph archive — tens of thousands of images covering everything from the 1906 earthquake recovery to the Fillmore District before redevelopment. Duplicate image replacement in that context isn't simply a technical exercise. Archivists must decide which version of a scanned photograph carries the highest resolution, the most accurate color profile, and the correct provenance metadata before the lower-quality duplicates are retired.
The San Francisco Arts Commission, which manages the city's Civic Art Collection from its offices on Van Ness Avenue, faces a parallel challenge with documentation images of public murals and sculptures. When a mural gets restored — as several in the Mission District have in recent years — multiple photographic records accumulate: pre-restoration surveys, in-progress shots, and final documentation images, often uploaded separately by different contractors with overlapping file names.
The financial stakes are real. Cloud storage costs for municipal governments have risen sharply since 2022, and duplicate files directly inflate those bills. Industry estimates cited in general technology procurement discussions suggest that unmanaged digital archives can contain anywhere from 20 to 40 percent redundant data, though the specific figure for San Francisco's systems has not been publicly released by the Department of Technology.
The Decisions That Can't Wait
Three choices are coming to a head. First, city departments must agree on a metadata standard before replacement workflows can be automated — otherwise, deleting a duplicate risks removing the only copy that carries critical provenance information. Second, whoever owns the replacement process matters enormously: an IT-driven purge and an archivist-led curation are fundamentally different operations, and turf questions between the Department of Technology and bodies like the Public Library Commission remain unresolved. Third, public access. When the San Francisco History Center replaces a lower-quality duplicate with a higher-resolution master file, does that improved image automatically become available through the city's open-data portal at datasf.org, or does it sit behind an internal system?
The Board of Supervisors' Government Audit and Oversight Committee is scheduled to review digital records management practices later in the fall of 2026. That hearing will likely be the first formal public forum where these competing priorities get aired together.
For residents and researchers who rely on digitized city records — from planning documents to historical photographs of neighborhoods like the Tenderloin and Bayview — the practical advice is to download and locally archive anything they currently use. Files that exist today as duplicates may be retired without public notice during any automated deduplication sweep. The city has not published a formal retention policy governing which image version survives when duplicates are consolidated. That gap, more than any technical limitation, is the decision that most urgently needs an answer.