San Francisco's Department of Technology and the City Clerk's office are facing mounting pressure to address a years-long accumulation of duplicate scanned images clogging the city's public records infrastructure — a problem that archivists say is consuming server capacity, slowing public records request fulfillment, and undermining the integrity of digital document archives that residents and journalists rely on daily.
The issue surfaced publicly this spring when the San Francisco Public Library's History Center on Larkin Street flagged inconsistencies in its digitized collection, discovering that hundreds of scan batches uploaded between 2021 and 2024 contained redundant image files that had never been deduplicated. The library's digital preservation team noted that the problem was not isolated to one department.
A Problem That Runs Across City Systems
City Hall's public-facing records portal, which handles everything from Planning Department permit documents to Board of Supervisors meeting archives, has long relied on third-party document management software. Technology specialists who work with municipal archives say that without automated deduplication protocols built into ingestion pipelines, agencies that batch-scan paper documents — as San Francisco departments routinely did during the COVID-19 court and office closures of 2020 and 2021 — tend to generate large volumes of identical or near-identical image files that stack up undetected.
Open government advocates at the San Francisco chapter of the League of Women Voters have raised the issue in public comment sessions before the Government Audit and Oversight Committee, arguing that inflated file counts make it harder for the public to locate authoritative versions of city documents. The problem is practical: when a permit record exists in four near-identical scanned versions, a resident searching the Planning Department's online portal at 1650 Mission Street cannot easily determine which version is the official one.
Digital archivists consulted on background — professionals who work with city and county records systems across California — say the San Francisco situation reflects a statewide gap. California's Government Code mandates retention schedules for public records but does not specify technical standards for deduplication or file integrity verification during digitization. That leaves individual departments to set their own practices, with uneven results.
What City Officials Are Being Asked to Do
The Board of Supervisors' Budget and Legislative Analyst office received a formal request in May 2026 to examine the cost of a citywide duplicate-image remediation project. Estimates circulating among city IT staff place the scope of the problem at several terabytes of redundant data across the Department of Building Inspection, the Assessor-Recorder's office at City Hall's Van Ness Avenue complex, and the City Attorney's document repository.
The San Francisco Department of Technology, which oversees the DataSF platform, has not publicly released a timeline or cost figure for remediation. Experts in municipal records management say that for a city of San Francisco's size — with roughly 900,000 residents and dozens of departments generating documents daily — a serious deduplication audit typically runs between $400,000 and $1.2 million depending on the depth of the review and whether legacy systems require manual inspection.
The Lawyers' Committee for Civil Rights of the San Francisco Bay Area, which regularly files California Public Records Act requests on behalf of clients, has noted in public filings that delayed or confused responses to records requests are becoming more frequent. Advocates stop short of attributing those delays solely to duplicate image problems, but say the overall state of city digitization needs independent review.
Mayor Daniel Lurie's administration, which took office in January 2026, has not yet issued a formal policy position on the records infrastructure question. The Department of Technology's next public presentation to the Board of Supervisors' Government Audit and Oversight Committee is scheduled for September 2026, and open-government groups say they intend to push for a specific line item addressing duplicate-image cleanup in the next budget cycle. For residents trying to pull permit histories or planning documents in the meantime, archivists recommend cross-referencing the DataSF open data portal with the specific department's own records counter — and requesting a certified copy when document authenticity matters.