Patent Office Improves Open Data Access to File Wrappers

“In just a year of lobbying and focusing stakeholder input, better access to Public PAIR data is now available to all stakeholders. It’s good to see America’s Innovation Agency innovating again in their public facing tools.”

USPTOLast week, the USPTO announced a public release of the Patent Center beta that will eventually replace PAIR and EFS-Web while providing improved access to prosecution histories and related data. The Office has always been on the cutting edge of technology for the Federal government. Their electronic filing system (EFS), Patent Application Information Retrieval (PAIR) system, hoteling, and other technology initiatives have put the Office out front of many government agencies. FY2020 Congressional Justification for the USPTO states that it is the responsibility of the USPTO to “foster[] innovation, competitiveness and job growth in the United States by … delivering IP information and education worldwide.” Open access to the U.S. Patent and Trademark Office (USPTO) data is core to the USPTO’s mission and aligns with the Open Data policies of the Federal government generally.

PAIR Overview

PAIR access allows users to view the status of a patent throughout the process and to download the imaged file wrapper (IFW), except for provisionals and older cases that predate online access. Prior to publication, only registered prosecutors who have power of attorney can access a patent application through Private PAIR. The USPTO locks down access through two-factor authentication using practitioner credentials through myUSPTO.gov. A registered practitioner can sponsor their paraprofessionals to grant access to all the files available to them through Private PAIR.

Public PAIR provides the same status and PDF downloads for any published application to any user worldwide. This access is invaluable to practitioners when reviewing prior art, gathering status on their portfolio or analyzing a competitor’s patent activity. Soon IP tool vendors, researchers and others wanted to automatically mine Public PAIR. Alternative sources of prosecution histories from Google Patents provide bulk IFW until 2012 and then Reed Tech makes more recent patent information available. There were issues with accessing the more recent data at scale with throttling, difficult interfaces and incomplete or slowly updated files. Many started scraping Private PAIR information with some vendors offering limited licenses to complete copies they had cobbled together consuming some 100 terabytes of data.

The USPTO infrastructure that provides PAIR data soon became a victim of its success by seeing increased need for patent data once made available. Heavy scraping for many needing some or all of the information taxed that infrastructure to cause PAIR, EFS and even internal examiner tools to become unstable. Especially toward the end of the day or end of the week, when filings were heaviest, practitioners suffered from timeouts, crashes and even lost data. Countless hours of lost effort and stress was affecting the stakeholders.

Efforts to Increase Stability

Last fall, the USPTO began to take measures to increase stability by deterring automatic scraping of PAIR. The first action was to stop allowing access to Public PAIR from the Private PAIR portal requiring users to have two browser instances open throughout the day to access their cases and prior art. Additionally, Public PAIR was protected from scraping with a CAPTCHA. Many software tools practitioners rely upon could no longer gather data automatically after these changes. By stopping automatic gathering of PDFs from the file wrappers and data for IDS preparation, patent filers and their counsel suffered lengthy revisions to work flows and increased fees.

About a year ago, a small group of tool developers called the Open PAIR Coalition initially comprising Triangle IP, BigPatentData, Harrity & Harrity, PatentPrufer, and GreyB began strategizing on how they could collaborate with others to inexpensively provide a copy of Public PAIR to all who were resorting to scraping. The thought was to set up a non-profit or benefit corporation that could cobble together the data for distribution in a manner that would not cause USPTO instability while reducing costs for all who were contemplating this in parallel.

As code sharing and collaboration began, we also decided to lobby the USPTO as it was in their best interest to separate out access to Public PAIR from their infrastructure, and the existing alternative from Reed Tech was not good enough to discourage scraping. When the Office released the anti-scraping measures, it deflated the Open PAIR Coalition as their efforts seemingly backfired. Countless developer hours were spent by the IP tool industry recoding scrapers, entering CAPTCHAs and scaling up the numbers of scraping instances. One tool vendor outside the Coalition had staff throughout the office entering CAPTCHAs as Public PAIR requested them so that prior art could be gathered for automated information disclosure statements (IDS) filings for all their clients.

The increased scraping activity probably looked to be a denial of service attack to the USPTO, but open data if successfully made available and desirable will always cause an “arms race” of deterrence followed by workarounds. A “death spiral” of increasing load from the scraping engines while countermeasures from to guard the infrastructure being targeted is to be expected. The Director of the Midwest Regional USPTO office in Detroit, Damian Porcari, is a self professed fan of IP tools and organized symposiums late last year to gather stakeholder feedback and brainstorm the issues.

The Open PAIR Coalition organized stakeholder interest with a Slack channel that resulted in an Impact Statement sent to Director Iancu on New Year’s Day 2020 with more than 20 signatories. Among other suggestions, the Coalition asked that a separate data portal be provided that allows IFW access to Public PAIR files. Please reach out if you have interest in joining the Slack channel.

In late January, a larger symposium was organized by the USPTO in Alexandria and exclusively reported on by IP Watchdog. Shortly thereafter, the Patent Public Advisory Committee (PPAC) meeting in early February included a discussion of the PAIR issues and changes in the fall that affected tool vendors. A formal response from the Director to the Impact Statement was received in March acknowledging the Office engagement on the topic. Coalition member, Blackhills IP introduced Chrome and Firefox plugins that gather performance data as you interact with PAIR to provide real time performance statistics that all PAIR users are encouraged to add to their browsers.

Without much fanfare or advanced notice, the USPTO last week introduced a Patent Center portal with improved IFW access!  The roll out schedule is excerpted below where we have progressed to the public beta in advance of decommissioning PAIR and EFS-Web.

 

The beta version of Patent Center provides completely open access to Public PAIR records without login or CAPTCHA. Referring to the below Figure, the information is organized by tabs on the left to provide detailed information in the right pane. The interface is faster than PAIR and provides additional information all organized with an improved GUI. This is a dream come true for the Open PAIR Coalition.

The Open PAIR Coalition is ecstatic with the ease of gathering IFW from Patent Center along with the integration of other data in one simple interface. However, there is always room for improvement. Providing an application programming interface (API) or database calls to the IFW and other data will reduce burden on the web servers designed for human interaction. Presuming modern scalable infrastructure, automated access to Patent Center should not affect other systems at the USPTO. The IP5 patent offices (i.e., US, Japan, Korea, China, & Europe) already share prosecution histories and data among each other, and the open data community should enjoy the same access to provide Office customers the necessary tools, data and analytics to easily engage with the USPTO. Stated bluntly, why should China or Japan enjoy API access to US prosecution histories while patent filers and their vendors have to resort to elaborate data gathering techniques.

Conclusion

The USPTO remains a technology leader in the Federal government and renews its commitment to open data with the release of Patent Center. In just a year of lobbying and focusing stakeholder input, better access to Public PAIR data is now available to all stakeholders. It’s good to see America’s Innovation Agency innovating again in their public facing tools. Assuming these new tools can scale to fulfill their stated purpose, there should be no reason for scraping tools to burden Public PAIR or any longer overtax USPTO infrastructure. All should look forward to what this new Patent Center offers while enjoying more stable interaction with the USPTO systems. The Office is hosting four Patent Center update webinars sessions over the next week. To sign up for one of them, click here. For more information on the Open PAIR Coalition, to stay up to date on open PAIR developments, or to participate in their Slack channel, please reach out to [email protected].

 

Share

Warning & Disclaimer: The pages, articles and comments on IPWatchdog.com do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of IPWatchdog.com.

Join the Discussion

3 comments so far.

  • [Avatar for Thomas Franklin]
    Thomas Franklin
    April 30, 2020 10:50 am

    Peter, There is no API unfortunately. A super-PEDS with IFW would be a dream. Of course, PEDS being updated daily would be great too. Hopefully, the USPTO will allow gathering of data without the GUI getting in the way. But, this is a huge improvement over PAIR.

  • [Avatar for Peter Kramer]
    Peter Kramer
    April 30, 2020 08:52 am

    Regarding my last comment, I have also used PEDS, but PEDS is only partially useful for bulk data acquisition because PEDS data is not sufficiently up-to-date. I wrote a Windows 10 desktop app to grab information for my purposes from the xml files I would download from PEDS. The app then makes a csv file importable into Excel. it would be great, except for the not-so-current PEDS data. I did notice that PEDS data did look a little more current a few weeks ago but I have not checked recently. Hopefully the PTO will maintain PAIR data in real time for PEDS.

  • [Avatar for Peter Kramer]
    Peter Kramer
    April 30, 2020 08:35 am

    I’m curious if the interface has provision for batch operation. I typically retrieve the PAIR application data page under the Application Data tab. Previously I would scrape 700 data pages for each of 700 respective applications. One-at-a-time manual lookup would be incredibly tedious and would drive one to insanity. Is there a web interface for batch acquisition of data or do I have to program a client app to interface with the system?