San Francisco's public digital infrastructure has a clutter problem. Across city-managed repositories, libraries, and civic tech platforms, duplicate image files have accumulated for years—redundant photographs, scanned documents, and archival visuals that eat up server space, slow down search functions, and complicate public access to records. The question now is who decides what gets deleted, what gets kept, and who pays for the cleanup.
The issue has sharpened this summer as the city's Department of Technology rolls into a budget cycle under pressure. Housing agencies digitizing Mission District planning records, Muni uploading surveillance and infrastructure photos, and the San Francisco Public Library's San Francisco History Center on Larkin Street—all of them feed into shared or parallel storage systems where duplicate image management has largely been handled ad hoc, if at all.
Why This Moment Matters
The timing is pointed. San Francisco's broader push to modernize its civic tech stack has accelerated since 2024, driven partly by federal infrastructure grants and partly by pressure from the Controller's Office to cut operational costs. Cloud storage is not free. Enterprise-grade storage for large image libraries can run well above $50,000 annually for a mid-sized city agency, depending on volume and redundancy protocols, and duplicates compound those costs directly.
At SF Digital Services, the team that manages the city's resident-facing web infrastructure at City Hall and beyond, engineers have flagged duplicate image handling as a structural issue in internal working documents reviewed by this reporter. The San Francisco Recreation and Parks Department, which maintains image libraries for more than 220 parks and facilities, has separately acknowledged a backlog in its digital asset management system, though the department has not issued a public timeline for resolving it.
The San Francisco Public Library's History Center holds one of the most consequential collections at stake. Digitized photographs dating to the Gold Rush era live alongside more recent scans, and volunteers and staff have flagged duplicate entries that create confusion in the online catalog. Librarians there have been working with the Internet Archive, based in the Richmond District on Funston Avenue, on protocols for deduplication—but a finalized policy has not been adopted as of this week.
The Decisions Ahead
Three questions will define the outcome over the next six months. First: who has deletion authority? In most city agencies, no single office holds clear jurisdiction over purging image files from shared systems. The City Administrator's Office and the Department of Technology have overlapping roles, and without explicit policy, individual departments default to keeping everything—which is how duplicates accumulate in the first place.
Second: will the city invest in automated deduplication software, or rely on manual review? Commercial tools from vendors who work with municipal governments can identify near-duplicate images using hash-matching and perceptual algorithms, but licensing costs vary widely. A pilot program at the Planning Department, which processes thousands of permit-related property photographs annually at its offices on Mission Street, could serve as a test case for citywide rollout.
Third: what counts as a true duplicate versus a meaningful variant? A photograph taken from the same angle on different dates may look identical but carry distinct evidentiary value—particularly for infrastructure documentation, police records, or environmental assessments. Cultural institutions like the History Center are especially cautious here. Deleting the wrong file in an archival context is not a recoverable error.
The Board of Supervisors' Government Audit and Oversight Committee is scheduled to hear a broader digital infrastructure update later this summer, which city technology staff say could include discussion of storage efficiency. Advocacy groups focused on open government, including the San Francisco chapter of civic tech organization Code for America, have pushed for transparent retention policies that give the public a voice before mass deletions occur. The Fourth of July holiday gives agencies a brief pause before the real work resumes Tuesday morning—and the decisions made in the weeks that follow will be difficult to undo.