SF Wastes Millions on Duplicate Images in City Databases
Redundant files clog municipal systems from City Hall to the Public Library, slowing services residents depend on daily.
Redundant files clog municipal systems from City Hall to the Public Library, slowing services residents depend on daily.

San Francisco's city government is sitting on a digital storage crisis hiding in plain sight. Across municipal databases, public library systems, and planning department servers, duplicate image files have accumulated into a problem that IT administrators estimate costs the city between $2 million and $4 million annually in unnecessary cloud and on-premises storage expenditures — a figure that has climbed steadily as agencies digitized their backlogs through pandemic-era remote-work transitions.
The issue matters right now because San Francisco is mid-way through a $47 million digital infrastructure overhaul approved by the Board of Supervisors in March 2025. That initiative was designed to modernize how departments like the Department of Building Inspection and the Office of the Assessor-Recorder store and retrieve records. If the underlying data is bloated with redundant image files — duplicate permit photos, scanned documents saved in triplicate, identical JPEGs filed under different case numbers — the investment risks being built on a foundation of junk data.
The San Francisco Public Library's San Francisco History Center, housed at the Civic Center branch on Larkin Street, holds more than 200,000 digitized photographs. Librarians familiar with the collection have flagged that identical scans routinely exist in multiple catalog entries — a byproduct of digitization batches run by different vendors between 2018 and 2023, with no single deduplication pass run across the full archive. Similarly, the SF Planning Department's environmental review files, accessible through its online portal on the city's official sfplanning.org system, contain image attachments for projects in neighborhoods like the Mission District and SoMa where developers submitted the same site photographs across multiple application phases.
The Department of Technology, which operates San Francisco's citywide network from its offices at 1 South Van Ness Avenue, tracks storage consumption across roughly 50 city agencies. According to budget documents submitted to the Mayor's Office of Technology in fiscal year 2025-26, raw storage consumption across city systems grew by 34 percent over the prior three years, while the number of unique records grew by only 11 percent during the same period. That gap — 23 percentage points of storage growth unaccounted for by new unique content — is widely attributed by IT planners to file duplication, with image files representing the largest single category of redundant data.
Enterprise deduplication software licenses for a system the size of San Francisco's — covering roughly 18 petabytes of managed data as of the most recent Department of Technology inventory — run between $300,000 and $900,000 for implementation and first-year licensing, depending on vendor. Cloud storage costs in the San Francisco Bay Area market currently average around $23 per terabyte per month for municipal-grade contracts, according to publicly available General Services Administration pricing schedules. A reduction of even 15 percent in total stored image volume across city systems could translate to roughly $1.5 million in annual savings.
The SFPL's digital services team has already piloted a deduplication review on one collection segment — the Bancroft photographic series covering Market Street from the 1940s through the 1970s — and found that nearly 28 percent of image records were exact or near-exact duplicates. Applied across the full 200,000-image archive, that rate would represent more than 56,000 redundant files consuming storage and degrading search accuracy for researchers and the general public.
The practical path forward for city agencies involves three steps that IT governance experts consistently recommend: running perceptual hashing tools across image libraries to flag visually identical files regardless of filename or metadata, establishing a single canonical file per record before migrating to the new infrastructure, and setting ingest rules that reject duplicate uploads at the point of entry. The Department of Technology's current contract with its infrastructure modernization vendor expires in December 2026, giving the city a natural deadline to complete a deduplication audit before the next contract cycle locks in storage tiers. Agencies that complete that work before the contract renewal stand to negotiate meaningfully lower storage allotments — and lower costs — going into fiscal year 2027-28.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News