The Daily San Francisco

San Francisco news, every day

News

SF's Aging Digital Archive Has a Duplicate Image Problem — Here's What Happens Next and the Key Decisions Ahead

City agencies and cultural institutions are staring down a costly, time-sensitive reckoning over redundant digital assets clogging storage systems and distorting public records.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

4 min read

SF's Aging Digital Archive Has a Duplicate Image Problem — Here's What Happens Next and the Key Decisions Ahead
Photo: Photo by Renato Nascimento on Pexels

San Francisco's public digital infrastructure is carrying a quiet but growing burden: thousands of duplicate images scattered across municipal databases, from the Department of Public Works' street-condition photo logs to the San Francisco Public Library's digitized historical collections at the Civic Center branch. The problem has reached a point where city technology officers are being pushed to make concrete decisions about deduplication tools, vendor contracts, and what gets permanently deleted — choices that carry both fiscal and archival consequences.

The timing matters. San Francisco is mid-cycle on a technology modernization push that the Department of Technology launched in earnest in early 2025, folding in AI-assisted asset management tools. With the city's general fund under continued pressure — the fiscal year 2026 budget debate stretched into June — every dollar spent on redundant cloud storage is a dollar that department heads are now being asked to justify. Duplicate image files aren't a glamorous budget line, but they compound across dozens of systems and can inflate storage costs significantly over a multi-year horizon.

Where the Redundancy Is Concentrated

The issue shows up differently depending on the agency. At SF Digital Services, which manages the city's resident-facing web properties and underlying content management systems, duplicate imagery accumulates through routine content migration — every time a page is rebuilt or a template is updated, image assets can be re-uploaded without old versions being purged. The San Francisco Recreation and Parks Department faces a parallel problem in its permit and programming databases, where event photography uploaded by staff across sites from McLaren Park to the Ferry Building has never been subject to a systematic deduplication pass.

The San Francisco History Center, housed inside the Main Library on Larkin Street, has been digitizing physical photographs since the early 2000s. Librarians there have long flagged that batch-scanning workflows sometimes produce near-identical files — slightly different exposures of the same negative — that end up indexed as separate records in the Online Archive of California. Resolving those duplicates requires both automated tooling and human curatorial judgment, a combination that is neither cheap nor fast.

On the private side, the concentration of biotech and life sciences firms in Mission Bay and SoMa means that local technology vendors pitching deduplication solutions have a ready commercial market beyond city government. Companies in those corridors routinely manage large imaging datasets — pathology slides, lab documentation, clinical trial photography — and several have already deployed machine-learning deduplication pipelines that city IT staff have pointed to as potential models.

The Decisions That Can't Wait

Three choices are now sitting on the desk of city technology planners. First: whether to run a full audit before any deletion, or to begin automated purges immediately using hash-matching algorithms that flag byte-for-byte identical files. The audit-first approach is safer for archival integrity but could push meaningful storage savings past the end of the current fiscal year, which closed June 30, 2026. Second: which vendor or open-source framework to standardize on. The city has piloted at least two commercial platforms in the past 18 months, and a third procurement cycle would require Board of Supervisors approval if it crosses the $250,000 single-contract threshold under city purchasing rules. Third: governance — specifically, whether the final call on what constitutes a true duplicate in historical collections rests with the Department of Technology or with individual agency archivists.

That last question is more contentious than it sounds. Deleting a file that turns out to be the only surviving digital copy of a 1906 earthquake photograph, or a unique angle on a Tenderloin street scene from the 1970s, is not a recoverable error. The San Francisco Public Library's preservation staff has been vocal internally about wanting veto power over any automated deletion that touches the History Center's collections, according to publicly available meeting notes from the Library Commission's May 2026 session.

The next Library Commission meeting, scheduled for August 2026, is expected to include a staff report on the deduplication policy framework. The Department of Technology is separately expected to bring storage cost projections to the city's Committee on Information Technology before September. Those two timelines will determine whether San Francisco moves into fiscal year 2027 with a coherent plan — or keeps paying to store the same image twice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.