The Daily San Francisco

San Francisco news, every day

News

SF's Digital Records Push Hits a Wall: What Happens Next With Duplicate Image Cleanup

City agencies and nonprofits racing to digitize San Francisco's archival holdings are now confronting a costly, complicated backlog of duplicate images — and the decisions made this summer will shape public access to the city's history for decades.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

4 min read

SF's Digital Records Push Hits a Wall: What Happens Next With Duplicate Image Cleanup
Photo: Photo by Deane Bayas on Pexels

San Francisco's push to modernize its public records has run into a problem that sounds mundane but carries real consequences: tens of thousands of duplicate digital images clogging archival databases across multiple city departments, slowing searches, inflating storage costs, and in some cases burying unique historical photographs under layers of identical or near-identical scans. The question facing city archivists, technology contractors, and officials at the San Francisco Public Library and the San Francisco History Center right now is straightforward — who decides what gets deleted, and by what standard?

The timing matters. The city's Digital Equity and Access Initiative, which has drawn funding from both the Mayor's Office of Housing and Community Development and private technology partners in SoMa, was designed to make public records searchable and freely available online by the end of fiscal year 2026. Duplicate images are not a trivial obstacle. Storage fees for municipal cloud infrastructure have climbed alongside the broader AI-infrastructure buildout in the Bay Area, making the cost of doing nothing measurable and growing. The Library Commission is scheduled to take up a related budget item later this month.

The San Francisco History Center, located on the sixth floor of the Main Library on Larkin Street in Civic Center, holds one of the most actively accessed municipal photograph collections on the West Coast. Staff there have been working since early 2025 with an outside vendor to run deduplication software across roughly 400,000 scanned images, a project that was originally expected to take eight months. It has taken longer. The complication, according to public commission meeting minutes from March 2026, is that automated tools flag near-duplicates — slightly different exposures of the same scene, or images scanned twice at different resolutions — and human reviewers must adjudicate which version has higher archival value. That review process requires specialist knowledge that contract staff do not always have.

The Decisions Ahead

Three distinct choices will determine how this plays out. First, the city must decide whether to invest in expanding the History Center's in-house archival staff or to continue relying on outside contractors. The current contractor arrangement runs through September 30, 2026. Second, the San Francisco Public Library needs to set a formal retention policy for near-duplicate images — something it has not done in writing, based on publicly available commission documents. Without a written standard, individual reviewers are making judgment calls that could be inconsistent or legally challenged later. Third, the Mayor's Office of Civic Innovation, which has a stated interest in using AI tools to accelerate city services across departments, must determine whether large-language-model-assisted image classification is appropriate for archival decisions, or whether the risk of error is too high for irreplaceable materials.

The Prelinger Library on Ninth Street, a private research library in SoMa that holds significant San Francisco ephemera and has informally collaborated with city archivists on digitization questions before, offers one model: its staff have published internal guidelines for handling duplicate scans that emphasize human sign-off before any file deletion. That framework has not been formally adopted by any city agency.

Costs and Consequences

Cloud storage is not free. Municipal IT contracts reviewed in the city's published budget documents for fiscal year 2025-26 show that storage costs for the Library system's digital assets are a line item that the budget analyst's office flagged for review. Deduplication of even a fraction of the current archive could reduce that burden, but only if the process is completed correctly — botched deduplication has led to permanent data loss in comparable projects undertaken by municipal governments in Chicago and New York in recent years.

For residents, the practical stakes are access. The History Center's online photograph portal, which draws users from the Castro, the Tenderloin, the Sunset, and far beyond the city, becomes genuinely less useful when searches return stacks of identical results. Genealogists, architects researching historical building facades on Valencia Street or Pacific Avenue, and documentary filmmakers all depend on a clean, well-curated database.

The Library Commission's July session is the next concrete opportunity for the public to weigh in. The agenda has not yet been posted, but commission meetings are held at the Main Library and are open to public comment. Anyone with a stake in how the city manages its digital heritage — and that includes every neighborhood in San Francisco — has reason to show up.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.