The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Purge Duplicate Images From Public Records Systems This Week

A coordinated cleanup effort targeting redundant digital assets across multiple municipal databases is saving storage costs and untangling years of messy file management.

By San Francisco News Desk · Published 4 July 2026, 11:36 am

3 min read

SF City Agencies Push to Purge Duplicate Images From Public Records Systems This Week
Photo: Photo by Brett Sayles on Pexels

San Francisco's Department of Technology quietly hit a milestone this week, completing the first phase of a duplicate-image replacement initiative that has been scrubbing redundant photographs, scanned documents, and digital assets from the city's sprawling network of public-records databases. The effort, which spans systems used by multiple departments at City Hall and the Civic Center complex, is the most aggressive digital-records cleanup the city has attempted since migrating legacy files to cloud infrastructure beginning in 2022.

The timing matters. San Francisco spends roughly $14 million annually on data storage and management contracts across its major municipal departments, according to budget documents the city controller's office publishes each fiscal year. With that figure under pressure — the mayor's office has been pushing for administrative savings to offset deficits projected to extend through fiscal year 2027-28 — information technology managers have identified redundant digital assets as a concrete, low-friction place to cut costs without reducing services that residents actually see.

What the Cleanup Actually Involves

Duplicate-image replacement is less glamorous than it sounds. City IT staff and contracted vendors are using automated hash-matching tools to identify files that are byte-for-byte identical or near-identical copies sitting in separate folders, sometimes across different departmental servers. When a match is confirmed, the redundant copy is replaced with a pointer to the canonical file — a process that consolidates storage without deleting records. The San Francisco Public Library's digital collections unit, headquartered at the main branch on Larkin Street at Fulton, has been running a version of this process on its digitised historical photograph archive since March 2026. The library's effort alone surfaced more than 22,000 duplicate image files in its California history collection.

The Department of Public Works, which manages thousands of street-condition photographs generated every week by inspection crews across neighbourhoods from the Tenderloin to the Excelsior, is further along. Works crews upload images to a centralised asset-management platform, and until this year, the system had no automated deduplication layer. Project managers working out of the department's operations centre on Cesar Chavez Street say the first audit identified redundant files consuming an estimated 4 terabytes of active storage — capacity that was being paid for at commercial cloud rates.

Broader Implications for City Data Infrastructure

This week's progress connects to a wider push. San Francisco's DataSF program — the city's open-data initiative operating under the City Administrator's Office — has been advocating internally for uniform file-naming standards and metadata tagging since at least 2024, partly to make exactly this kind of cleanup feasible at scale. Without consistent metadata, automated tools struggle to distinguish a genuine duplicate from two photographs of the same pothole taken minutes apart that carry legitimate evidentiary value for separate inspection reports.

The San Francisco Planning Department faces a version of this problem with its environmental review archives. Thousands of pages of scanned documents, many generated during the development-application surge in SoMa and the Mission District between 2015 and 2020, exist in multiple copies across project folders. The department has been piloting an optical-character-recognition-assisted deduplication tool since January 2026, with results expected to be reported to the planning commission before the end of the third quarter.

For residents, the practical upshot is modest but real. Faster search results on public-records portals, lower risk of outdated images surfacing in planning documents, and — over time — city dollars that don't go to storing the same photograph of a cracked sidewalk on Mission Street six times over. The Department of Technology has scheduled a public progress report for the Board of Supervisors' Government Audit and Oversight Committee for September 2026, at which point administrators are expected to release consolidated figures on storage recovered and estimated annual savings. Until then, the work continues floor by floor through the city's digital back catalogue.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.