SF City Agencies Push to Fix Digital Archive Mess as Duplicate Image Problem Hits Records Systems
A quiet but costly data headache has forced several San Francisco departments to accelerate cleanup of bloated digital archives this week.
A quiet but costly data headache has forced several San Francisco departments to accelerate cleanup of bloated digital archives this week.

San Francisco's Department of Technology confirmed this week that at least three city agencies are actively running duplicate-image-replacement projects after audits revealed their shared document management systems had become choked with redundant files, slowing records retrieval and inflating cloud storage costs. The problem, long treated as routine IT housekeeping, has gained new urgency as agencies try to comply with updated digital-records transparency rules that took effect July 1.
The timing matters. The city moved aggressively toward cloud-based document storage between 2022 and 2024, pushing departments off aging on-premise servers. That migration, done quickly, carried over years of duplicate scans, re-uploaded permit images, and redundant photo attachments. What looked like a solved problem turned into an expanding one as staff kept adding files to systems that had never been properly deduplicated.
The San Francisco Planning Department, which holds tens of thousands of permit-application image files tied to properties across neighborhoods from the Sunset to SoMa, is among those running active cleanup operations this week. Staff have been working through a backlog of duplicate property photographs attached to permit records in the city's Accela permitting platform — a system that planning staff and contractors both upload to, often creating two or three copies of the same site photo. The department did not provide a total file count, but the project has been assigned to an internal data-management team working out of the planning offices at 49 South Van Ness Avenue.
The San Francisco Public Library's digital collections unit, based at the main branch on Larkin Street in the Civic Center, is running a parallel effort on its historical photograph archive. Librarians identified the duplication problem after migrating to a new content-management system in early 2025. The archive holds digitized images stretching back to the 19th century, and the migration had created multiple versions of the same scanned photographs with slightly different file names, making search results confusing for researchers and public users alike. The library's digital-services team began systematic deduplication work in May and expects to complete the initial pass by the end of July.
Cloud storage isn't free. San Francisco's citywide IT budget allocated roughly $18 million for cloud infrastructure in fiscal year 2025-26, according to the Mayor's budget documents published in June. Duplicate files don't just clutter search results — they eat into that allocation. Industry benchmarks from organizations like AIIM, the information-management trade association, suggest enterprise document systems can carry duplication rates between 20 and 40 percent after large migrations, meaning a significant share of stored images may be redundant copies.
The July 1 effective date for updated city digital-records standards — part of a package approved by the San Francisco Board of Supervisors earlier this year — requires departments to certify that public-facing document portals return accurate, non-duplicative results when residents search for records. That requirement has given what was previously a back-office IT task a public-accountability dimension. Residents searching the Planning Department's online portal for images of a building permit at, say, a Mission District address should now receive a clean, deduplicated result set rather than three near-identical scans.
The push also intersects with the city's broader AI ambitions. The Department of Technology has been piloting machine-learning tools for several city functions, and accurate, clean image libraries are a prerequisite for any AI-assisted document review. A bloated archive full of duplicate images degrades the training data those tools rely on, according to standard data-quality guidance from the National Institute of Standards and Technology.
For San Francisco residents, the practical upshot is straightforward: if you've tried to look up permit photos or historical images through city portals recently and gotten confusing duplicated results, that experience should improve over the coming weeks. The Planning Department's cleanup is targeted for completion before the end of August. The Public Library's digital team has set up a feedback form on its website so researchers can flag specific duplicated entries they encounter while the project is still active. City IT staff say the deduplication protocols being developed now are also meant to prevent the same problem from recurring as more agencies migrate to cloud platforms over the next 18 months.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News