The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Fix Digital Archive Mess as Duplicate Image Problem Hits Records Systems

A quiet but costly data headache has forced several San Francisco departments to accelerate cleanup of bloated digital archives this week.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

4 min read

SF City Agencies Push to Fix Digital Archive Mess as Duplicate Image Problem Hits Records Systems
Photo: Photo by Belle Co on Pexels

San Francisco's Department of Technology confirmed this week that at least three city agencies are actively running duplicate-image-replacement projects after audits revealed their shared document management systems had become choked with redundant files, slowing records retrieval and inflating cloud storage costs. The problem, long treated as routine IT housekeeping, has gained new urgency as agencies try to comply with updated digital-records transparency rules that took effect July 1.

The timing matters. The city moved aggressively toward cloud-based document storage between 2022 and 2024, pushing departments off aging on-premise servers. That migration, done quickly, carried over years of duplicate scans, re-uploaded permit images, and redundant photo attachments. What looked like a solved problem turned into an expanding one as staff kept adding files to systems that had never been properly deduplicated.

Which Departments Are Affected — and What They're Doing About It

The San Francisco Planning Department, which holds tens of thousands of permit-application image files tied to properties across neighborhoods from the Sunset to SoMa, is among those running active cleanup operations this week. Staff have been working through a backlog of duplicate property photographs attached to permit records in the city's Accela permitting platform — a system that planning staff and contractors both upload to, often creating two or three copies of the same site photo. The department did not provide a total file count, but the project has been assigned to an internal data-management team working out of the planning offices at 49 South Van Ness Avenue.

The San Francisco Public Library's digital collections unit, based at the main branch on Larkin Street in the Civic Center, is running a parallel effort on its historical photograph archive. Librarians identified the duplication problem after migrating to a new content-management system in early 2025. The archive holds digitized images stretching back to the 19th century, and the migration had created multiple versions of the same scanned photographs with slightly different file names, making search results confusing for researchers and public users alike. The library's digital-services team began systematic deduplication work in May and expects to complete the initial pass by the end of July.

Cloud storage isn't free. San Francisco's citywide IT budget allocated roughly $18 million for cloud infrastructure in fiscal year 2025-26, according to the Mayor's budget documents published in June. Duplicate files don't just clutter search results — they eat into that allocation. Industry benchmarks from organizations like AIIM, the information-management trade association, suggest enterprise document systems can carry duplication rates between 20 and 40 percent after large migrations, meaning a significant share of stored images may be redundant copies.

The Broader Push: Transparency Rules Add Pressure

The July 1 effective date for updated city digital-records standards — part of a package approved by the San Francisco Board of Supervisors earlier this year — requires departments to certify that public-facing document portals return accurate, non-duplicative results when residents search for records. That requirement has given what was previously a back-office IT task a public-accountability dimension. Residents searching the Planning Department's online portal for images of a building permit at, say, a Mission District address should now receive a clean, deduplicated result set rather than three near-identical scans.

The push also intersects with the city's broader AI ambitions. The Department of Technology has been piloting machine-learning tools for several city functions, and accurate, clean image libraries are a prerequisite for any AI-assisted document review. A bloated archive full of duplicate images degrades the training data those tools rely on, according to standard data-quality guidance from the National Institute of Standards and Technology.

For San Francisco residents, the practical upshot is straightforward: if you've tried to look up permit photos or historical images through city portals recently and gotten confusing duplicated results, that experience should improve over the coming weeks. The Planning Department's cleanup is targeted for completion before the end of August. The Public Library's digital team has set up a feedback form on its website so researchers can flag specific duplicated entries they encounter while the project is still active. City IT staff say the deduplication protocols being developed now are also meant to prevent the same problem from recurring as more agencies migrate to cloud platforms over the next 18 months.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.