San Francisco city officials moved this week to accelerate a long-delayed cleanup of duplicate images clogging public-facing databases, with at least three departments reporting active deduplication efforts underway as of July 3. The push affects everything from permit photographs stored at the Department of Building Inspection's Civic Center offices to archival images held by the San Francisco Public Library's History Center on Larkin Street.
The timing matters. The city's digital infrastructure office has been under pressure since early 2026 to cut storage costs as part of a broader budget consolidation. Redundant image files — the same photograph stored under multiple file names or in multiple systems — quietly consume server capacity and slow retrieval times for staff and the public alike. With AI-assisted cataloguing tools now available at lower price points, several departments decided this was the week to act rather than wait for a formal city-wide mandate.
What the Cleanup Looks Like on the Ground
At the San Francisco Planning Department, staff have been running automated scripts against the Parcel Information database, which contains hundreds of thousands of site photographs taken since the early 2000s. The problem is structural: when the city migrated legacy systems to a cloud platform in 2023, files were often copied rather than moved, creating near-identical duplicates that now account for a meaningful share of total stored data. The department has not released official figures yet, but similar migrations in comparable municipal systems have produced duplication rates of 20 to 35 percent of total image libraries, according to industry benchmarks published by the Urban Institute in March 2025.
The San Francisco Public Library's digitisation program, headquartered at the Main Branch on Larkin Street, faces a different version of the same problem. The History Center has been digitising photographs from the James R. Tait collection and other 20th-century San Francisco holdings. Volunteers and contractors scanning physical prints have sometimes produced multiple scans of identical images, and different batches of scans were uploaded to separate folders without cross-referencing. Librarians began a manual-and-automated review process on June 30, targeting completion before the end of July.
Meanwhile, the SF Department of Elections, which stores ballot-related imagery and scanned documentation at its City Hall suite, began its own deduplication review on July 2. The department confirmed the review is routine ahead of the November 2026 election cycle, during which document imaging volumes are expected to spike significantly.
Why Duplicate Images Are a Bigger Problem Than They Sound
Storage costs are one issue. But duplicate images in public records systems also create legal and transparency headaches. When a journalist or attorney files a public records request, duplicate files can appear as separate responsive documents, inflating production costs and creating confusion about what the canonical record actually is. The City Attorney's Office has flagged this as an area of risk in at least two internal guidance memos circulated in 2025, though those memos have not been made public.
The practical cost is real. Commercial cloud storage for municipal governments typically runs between $0.02 and $0.05 per gigabyte per month, and large city image repositories can run into tens of terabytes. Trimming even 25 percent of redundant data from a 50-terabyte archive would represent meaningful annual savings — rough math puts that in the range of $3,000 to $7,500 per year for storage alone, not counting staff time saved in search and retrieval.
The Tenderloin-based nonprofit Gray Area Foundation for the Arts, which has partnered with city cultural programs on digital preservation projects in the past, has publicly advocated for better metadata standards that would prevent duplicates from forming in the first place — a prophylactic approach rather than a recurring cleanup.
For San Francisco residents who use public portals to access permit records, property photographs, or archival imagery, the practical advice right now is straightforward: if you encounter broken image links or unexpected gaps in a city database over the next two to three weeks, it may reflect the deduplication process in progress. The Planning Department's online permit portal and the library's digital collections portal on SFPL.org are both expected to experience brief intermittent slowdowns through mid-July. Bookmark your searches and check back after July 18 for a cleaner, faster experience.