The Daily San Francisco

San Francisco news, every day

News

SF's Aging Digital Archive Has a Duplicate Image Problem. Here's What Happens Next.

City departments and cultural institutions face a cascade of decisions over how to clean up years of redundant digital records — and who pays for it.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

SF's Aging Digital Archive Has a Duplicate Image Problem. Here's What Happens Next.
Photo: Photo by Mo Eid on Pexels

San Francisco's public institutions are sitting on a growing crisis inside their own servers. Years of digitization drives, emergency pandemic scanning projects, and decentralized departmental uploads have left city archives, library systems, and civic tech platforms riddled with duplicate image files — redundant records that drain storage budgets, confuse public-facing search tools, and complicate the legal chain of custody for official documents.

The problem isn't unique to San Francisco, but the city's particular combination of ambition and dysfunction makes the decisions ahead unusually consequential. The San Francisco Public Library's digital collections portal, accessible through its Civic Center branch on Larkin Street, and the SF Planning Department's property records image database are among the systems most visibly affected. Both have undergone multiple overlapping digitization campaigns since 2018, each generating its own file-naming conventions and metadata schemas — and, inevitably, its own duplicates.

The Cost of Doing Nothing

Cloud storage is cheap until it isn't. City IT contracts reviewed in past budget cycles have pegged per-terabyte annual costs for managed government cloud storage at rates that compound quickly across dozens of departments. The SF Department of Technology, headquartered on Seventh Street in SoMa, has been wrestling with a broader data governance framework since at least 2023, but image deduplication has remained a lower-priority line item compared with cybersecurity hardening and network upgrades.

That calculus is shifting. Generative AI tools — the same wave reshaping hiring at companies from Salesforce Tower down to Mid-Market startups — are only as reliable as the data they index. When the city's 311 service or Planning's online permit portal surfaces duplicate property photos or mismatched parcel images, it erodes trust in systems that residents increasingly rely on for real decisions: whether to buy a home on 24th Street in Noe Valley, whether to contest a permit on Valencia Street in the Mission. Duplicate records aren't just an archival inconvenience. They produce concrete errors downstream.

The San Francisco History Center, which operates out of the Main Library on Larkin Street, has been working since early 2025 to reconcile digitized photograph collections that were scanned independently by at least three separate contractors between 2019 and 2022. The center's collection runs to hundreds of thousands of images, and staff have identified categories of near-duplicate files — slightly different scans of the same glass plate negative, for example — that require human curatorial judgment rather than automated deletion. That judgment costs time and money that the library system's current budget does not fully cover.

Key Decisions Still to Be Made

Several choice points are coming into focus for the second half of 2026. First, the Department of Technology is expected to publish updated data governance guidelines before the end of the third quarter — guidelines that will determine whether individual city departments are required to adopt a unified deduplication standard or left to develop their own approaches. The difference matters enormously for institutions like the SF Arts Commission, which maintains a separate image archive of public art installations across the city, from the Civic Center to the Bayview.

Second, the Board of Supervisors' budget committee will need to decide whether to fund a dedicated image remediation program or fold deduplication work into the broader IT modernization contract cycle that comes up for renewal in early 2027. Advocacy from library workers and archivists has been consistent, but it competes with louder priorities — housing permitting reform, Muni reliability, the ongoing fentanyl crisis response centered around the Tenderloin and UN Plaza.

Third, and most technically complex, is the question of what happens to duplicate images that carry conflicting metadata. When two scans of the same 1906 earthquake photograph have different date stamps, different rights attributions, or different subject tags, deleting one isn't a neutral act. Archivists at the History Center have been pushing for a merge-and-annotate protocol rather than deletion — a more labor-intensive approach that preserves the documentary record of how files were created and catalogued in the first place.

The city has until September 30, the end of the current fiscal year, to commit to a framework. After that, the next budget process begins, and another year of accumulating duplicates becomes the default outcome.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.