San Francisco's push to modernize its public records has run into a problem that sounds mundane but carries real consequences: tens of thousands of duplicate digital images clogging archival databases across multiple city departments, slowing searches, inflating storage costs, and in some cases burying unique historical photographs under layers of identical or near-identical scans. The question facing city archivists, technology contractors, and officials at the San Francisco Public Library and the San Francisco History Center right now is straightforward — who decides what gets deleted, and by what standard?
The timing matters. The city's Digital Equity and Access Initiative, which has drawn funding from both the Mayor's Office of Housing and Community Development and private technology partners in SoMa, was designed to make public records searchable and freely available online by the end of fiscal year 2026. Duplicate images are not a trivial obstacle. Storage fees for municipal cloud infrastructure have climbed alongside the broader AI-infrastructure buildout in the Bay Area, making the cost of doing nothing measurable and growing. The Library Commission is scheduled to take up a related budget item later this month.
The San Francisco History Center, located on the sixth floor of the Main Library on Larkin Street in Civic Center, holds one of the most actively accessed municipal photograph collections on the West Coast. Staff there have been working since early 2025 with an outside vendor to run deduplication software across roughly 400,000 scanned images, a project that was originally expected to take eight months. It has taken longer. The complication, according to public commission meeting minutes from March 2026, is that automated tools flag near-duplicates — slightly different exposures of the same scene, or images scanned twice at different resolutions — and human reviewers must adjudicate which version has higher archival value. That review process requires specialist knowledge that contract staff do not always have.
The Decisions Ahead
Three distinct choices will determine how this plays out. First, the city must decide whether to invest in expanding the History Center's in-house archival staff or to continue relying on outside contractors. The current contractor arrangement runs through September 30, 2026. Second, the San Francisco Public Library needs to set a formal retention policy for near-duplicate images — something it has not done in writing, based on publicly available commission documents. Without a written standard, individual reviewers are making judgment calls that could be inconsistent or legally challenged later. Third, the Mayor's Office of Civic Innovation, which has a stated interest in using AI tools to accelerate city services across departments, must determine whether large-language-model-assisted image classification is appropriate for archival decisions, or whether the risk of error is too high for irreplaceable materials.
The Prelinger Library on Ninth Street, a private research library in SoMa that holds significant San Francisco ephemera and has informally collaborated with city archivists on digitization questions before, offers one model: its staff have published internal guidelines for handling duplicate scans that emphasize human sign-off before any file deletion. That framework has not been formally adopted by any city agency.
Costs and Consequences
Cloud storage is not free. Municipal IT contracts reviewed in the city's published budget documents for fiscal year 2025-26 show that storage costs for the Library system's digital assets are a line item that the budget analyst's office flagged for review. Deduplication of even a fraction of the current archive could reduce that burden, but only if the process is completed correctly — botched deduplication has led to permanent data loss in comparable projects undertaken by municipal governments in Chicago and New York in recent years.
For residents, the practical stakes are access. The History Center's online photograph portal, which draws users from the Castro, the Tenderloin, the Sunset, and far beyond the city, becomes genuinely less useful when searches return stacks of identical results. Genealogists, architects researching historical building facades on Valencia Street or Pacific Avenue, and documentary filmmakers all depend on a clean, well-curated database.
The Library Commission's July session is the next concrete opportunity for the public to weigh in. The agenda has not yet been posted, but commission meetings are held at the Main Library and are open to public comment. Anyone with a stake in how the city manages its digital heritage — and that includes every neighborhood in San Francisco — has reason to show up.