The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archive Problem: What Officials and Experts Are Saying About Duplicate Image Chaos in City Records

From the Planning Department to the Public Library, San Francisco's public agencies are wrestling with ballooning digital storage costs and redundant image files that officials say are undermining records integrity.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

3 min read

San Francisco's city agencies collectively hold tens of millions of digital image files—permit photos, infrastructure scans, public health records—and a growing chorus of records managers, open-government advocates, and technology officials say duplicate images buried inside those archives are costing the city real money while degrading the reliability of public data.

The problem has sharpened this year as the Mayor's Office of Civic Innovation pushes a broader digital modernization drive. Budget documents submitted to the Board of Supervisors ahead of the fiscal year 2026–27 cycle flagged cloud storage expenditures across city departments as a category requiring consolidation. Duplicate image files—identical or near-identical photographs stored multiple times across separate departmental systems—are a leading driver of that excess, according to records governance professionals who work with municipal clients.

Where the Problem Shows Up

The San Francisco Planning Department, which processes thousands of permit applications each year from its offices at 49 South Van Ness Avenue, requires applicants to upload site photographs as part of standard submissions. Staff and technology consultants familiar with municipal permitting systems say it is common for the same project photo to land in three or four separate folders—one for the initial application, one for each inspection stage, one for the final certificate of occupancy. None of those copies are automatically deduplicated.

The San Francisco Public Library's digitization program, headquartered at the Main Branch on Larkin Street in Civic Center, faces a parallel challenge. The library has been digitizing historical photograph collections from the San Francisco History Center, a project that has produced hundreds of thousands of high-resolution image files since its expansion in 2022. Archivists working in that space have described publicly, in presentations at the California Library Association, the difficulty of tracking which scans are originals and which are redundant working copies created during quality-control workflows.

The Department of Technology, which oversees citywide IT infrastructure from its offices near City Hall, is coordinating a records deduplication pilot under the umbrella of the DataSF program. That initiative, which has been publicly described in Board of Supervisors committee hearings as a multi-year effort, is still in early phases for image-specific file types.

What Experts Are Recommending

Records management professionals who advise Bay Area public agencies point to perceptual hashing—a computational technique that identifies visually identical or near-identical images regardless of minor file differences—as the practical standard for large-scale deduplication. The technology is not new; major platforms have used versions of it since at least 2015. The gap, experts say, is that municipal procurement cycles and legacy system constraints have kept many city agencies from deploying it at scale.

The financial stakes are concrete. Enterprise cloud storage for large image files—particularly uncompressed TIFF formats used in archival scanning—runs between $0.02 and $0.05 per gigabyte per month depending on the vendor tier, according to standard pricing published by major cloud providers. For an agency holding several hundred terabytes of image data, eliminating even 20 percent duplication represents tens of thousands of dollars in annual savings before accounting for retrieval and processing costs.

Open-government groups in the city, including those that track public records request fulfillment under the San Francisco Sunshine Ordinance, have raised a separate concern: when the same image exists under multiple file names and folder paths, records searches return inconsistent results. A requester asking for all photographs associated with a specific building permit on, say, Valencia Street in the Mission District may receive three versions of the same file—or miss attachments entirely if the search indexes only one storage location.

The DataSF program office has indicated through its public project dashboard that a formal policy recommendation on image deduplication standards is expected before the end of calendar year 2026. Agencies are advised in the interim to audit their existing workflows for redundant upload steps, establish a single authoritative image repository for each record type, and consult the City Librarian's office, which has developed internal protocols for exactly this problem during the library's ongoing digitization work. For residents who submit photos with permit applications or public comment filings, the practical takeaway is straightforward: keep your own copies, because what the city stores and what it can retrieve are not always the same thing.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.