San Francisco's push to modernize its public digital archives has stalled at a critical juncture, with city departments holding tens of thousands of duplicate image files across incompatible platforms and no unified policy yet in place to resolve them. The problem, long acknowledged inside City Hall but rarely discussed publicly, is now forcing a decision: who owns the cleanup process, who pays for it, and what standard will govern what gets kept.
The timing matters. The city's Department of Technology is mid-way through a broader IT consolidation effort that began in fiscal year 2024-25, folding scattered departmental servers into centralized cloud infrastructure. That migration has surfaced the duplicate-image problem at scale. Agencies ranging from the San Francisco Public Library's San Francisco History Center at Larkin Street to the Planning Department's environmental review division have discovered overlapping image catalogues — some files duplicated dozens of times across shared drives, city portals, and legacy content management systems that predate the current mayoral administration.
The Stakes for Cultural Memory and City Services
For the History Center, which holds one of the most significant municipal photography collections on the West Coast, the issue is more than a storage bill. Archivists must determine whether duplicates represent genuinely identical files or slightly different scans of the same physical document — a distinction that matters enormously for preservation. A flawed automated deduplication run in 2023 reportedly flagged thousands of irreplaceable photographs of the Tenderloin, the Fillmore District, and pre-earthquake structures along Market Street for deletion before staff intervened.
The San Francisco Arts Commission, which manages the city's Civic Art Collection database, faces a parallel challenge. Its image repository — used by curators, researchers, and the public through the city's online portal — contains multiple versions of the same artwork photographs taken under different lighting conditions or by different contractors over the years. Deciding which version is canonical requires both curatorial judgment and technical standardization, and the Commission has not yet published a deduplication protocol as of this Fourth of July weekend.
City budget documents for fiscal year 2025-26 allocated roughly $4.2 million to the Department of Technology's cloud migration program overall, but line-item funding specifically for archival deduplication and metadata remediation was not broken out separately in publicly available budget summaries reviewed by The Daily San Francisco. That ambiguity is itself part of the problem: without a discrete budget and a named project owner, the work keeps getting deferred.
The Decisions That Will Define the Outcome
Three choices are now converging on a tight timeline. First, the city must decide by the end of the summer fiscal quarter whether to procure a dedicated digital asset management platform — vendors have been in conversations with the Department of Technology since at least March — or to extend existing contracts with the current patchwork of systems. A procurement decision pushed past September risks losing the migration momentum built up over the past 18 months.
Second, the Planning Department's imminent rollout of its updated Central SoMa environmental review portal, expected in late 2026, depends on a clean image library. Duplicate or mislabeled photographs of project sites along Folsom Street and Brannan Street have already created inconsistencies in at least two recent environmental impact reports, according to city planning meeting minutes published online.
Third, and most consequential for the public, is whether San Francisco follows the lead of cities like New York — which published a formal open-data image deduplication standard in 2024 — or continues to treat the issue as a back-office IT problem invisible to residents. Advocates at the Internet Archive, headquartered on Funston Avenue in the Richmond District and one of the city's most prominent digital preservation institutions, have argued publicly that municipal image data deserves the same transparency standards applied to other public records.
The Department of Technology is expected to present recommendations to the city's Committee on Information Technology this month. Whatever framework emerges will set the template not just for existing duplicates but for every photograph, scan, and rendering the city produces going forward — a quiet but consequential piece of infrastructure that touches everything from housing permit applications in the Sunset District to the preservation of San Francisco's visual history.