The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story

From City Hall's permit database to the SF Public Library's photo collection, redundant image files are costing city agencies real money and slowing down the systems residents depend on.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

San Francisco's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story
Photo: Photo by Brett Sayles on Pexels

San Francisco's municipal digital infrastructure is carrying a hidden weight. Across city agencies, duplicate image files — photographs, permit scans, permit attachments, GIS map exports — are consuming server space at a scale that IT administrators say has grown unmanageable since the post-pandemic push to digitize paper records. The Department of Technology, which oversees the city's shared cloud and on-premises storage contracts, is currently reviewing policies that would require agencies to run automated deduplication software on their image repositories before the next fiscal year begins July 1, 2027.

The timing matters. San Francisco's tech-sector volatility over the past 18 months — mass layoffs at companies based in SoMa and the Financial District followed by aggressive AI hiring — has produced a secondary effect: a glut of municipal vendors pitching AI-powered deduplication tools to city procurement officers. The city spent roughly $4.2 billion on technology contracts across all departments in the 2024–25 fiscal year, according to the Controller's Office budget summary, and storage costs are a growing line item inside that number.

Where the Redundancy Lives

The problem shows up most visibly in two places. The SF Planning Department's permit portal, accessible through the city's online planning system at 1650 Mission Street, holds digitized records going back to a 2017 scanning initiative. According to a 2025 internal audit summary published by the Budget and Legislative Analyst's Office, the Planning Department's document repository had grown to more than 11 terabytes, with an estimated 23 percent of stored image files identified as exact or near-exact duplicates. At commercial cloud storage rates, that redundancy alone represents tens of thousands of dollars in unnecessary annual expenditure.

The San Francisco Public Library's San Francisco History Center, housed at the main branch on Larkin Street in Civic Center, faces a parallel challenge with its digitized photograph collection. The History Center has been digitizing images from the Bancroft Library transfers and private donations since the early 2000s, and the collection now exceeds 200,000 individual image files. Librarians have flagged that duplicate scans — often created when multiple staff members digitized the same physical photograph without checking the catalog — inflate storage needs and complicate public search results on the Online Archive of California platform.

The Cost of Inaction

Storage is not free, and in San Francisco's budget environment it is becoming conspicuously expensive. The city migrated a portion of its data infrastructure to a hybrid cloud model under a contract signed in January 2024, and per-terabyte costs for cold storage on that contract run above the national municipal average because of California data residency requirements tied to state law. Deduplication vendors who have pitched the Department of Technology argue that a city-wide image cleanup could trim active storage by 15 to 30 percent, based on benchmarks from comparable municipal deployments in Chicago and New York City.

The SFMTA, which manages both Muni and the city's parking control systems, is separately wrestling with dashcam and traffic-camera footage that generates image frames stored as individual files. The agency's Potrero Division facility on Cesar Chavez Street handles archiving for the fleet, and IT staff there have described manually sorting through duplicate export batches as a routine but time-consuming task — time that comes at the cost of other infrastructure work.

For residents, the practical consequence is slower search results on city permit portals and library databases, and occasional retrieval errors when duplicate records conflict with each other in the database. For city administrators, it is budget pressure on an unglamorous line item that rarely survives the competition for Council attention against homelessness spending or transit capital projects.

The Department of Technology is expected to issue a request for proposals for a city-wide deduplication platform by September 2026. Agencies that want to participate in a pilot program are being asked to submit data inventories by August 15. The SF Public Library and the Planning Department are both understood to be among the early candidates. If a vendor is selected before the end of calendar year 2026, a phased rollout could begin in the spring — meaning the libraries and permit offices that residents use every day might finally start returning cleaner, faster results by mid-2027.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.