“Even with a hypothetically good training data set including large numbers of accurately checked patents, the AI algorithms have a much tougher task than in other applications [because] essentiality determination is subjective.”
Artificial Intelligence (AI) is providing enormous productivity and increased value in many applications. But AI is no panacea and is not yet sufficiently well developed to be precise or dependable everywhere. For example, much better AI training data is required to reliably estimate patent essentiality to standards such as 4G and 5G, where AI is being advocated by various experts and has already been adopted by one patent pool. There is also a lot of room for improvement in inferencing.
While most standard-essential patents (SEPs) are extensively licensed, based on comparable license benchmarks, with little or nothing in the way of essentiality checking, estimated essentiality rate metrics (i.e. the percentages of declared essential patents that are deemed actually to be essential) are sometimes used by the courts and by patent pools to apportion royalty charges. Essentiality rates vary among patent owners— with some of them much more inclined to over-declare their patents as possibly essential than others. It is therefore important to make any such checks accurately and reliably because several licensors — particularly leaders Ericsson, Nokia, InterDigital and Qualcomm —significantly rely on their SEP licensing income to fund R&D.
Magic and Reasoning
AI results can be inferred through commonalities such as semantic matches that have no technical or legal footing in patent law. Courts enable expert determinations of essentiality, validity and infringement to be challenged and explained based on patent claims and technical requirements in the standards. “Because the computer says so” does not wash there. AI algorithms do not provide reasoning for their positive and negative determinations in those terms.
There was widespread amazement a decade ago following the introduction of voice assistants, including Apple’s Siri and Amazon’s Alexa. Speech recognition algorithms usually enabled speech to be accurately turned into text. AI even overcame difficulties in dealing with accents and noisy environments. However, command responses have remained rather simple, for example, with selection of music and regurgitation of standard weather forecasts. “They were all dumb as a rock,” Microsoft’s Satya Nadella told Financial Times last month.
Introduction of the chatbot ChatGPT has taken the interpretation of text to a much higher level. ChatGPT can understand complex instructions and provide sophisticated responses, such as essays good enough to pass university exams. While chatbots have started by responding to typed enquiries, the same capabilities can and will also be applied to text generated through voice recognition. However, the main difference between ChatGPT and other AI approaches which have not been so successful or caused so much excitement of late is the fine tuning of the language model by human intervention. In this, experts manually rank the outputs produced as feedback to improve the AI model. AI outputs are thus aligned to the subjective preferences of human labellers who are provided with guidelines, but specific decisions are down to individuals. This reinforcement learning from human feedback (RLHF) is currently considered to be very powerful in improving AI reliability.
Unfortunately, AI algorithms can get things wrong and often do. They can be unacceptably prejudicial, for example, when longlisting and shortlisting from thousands of CVs in staff recruitment. According to OpenAI, “these models can also generate outputs that are untruthful, toxic, or reflect harmful sentiments. This is in part because GPT-3 is trained to predict the next word on a large dataset of Internet text, rather than to safely perform the language task that the user wants. In other words, these models aren’t aligned with their users.”
The European Commission’s 2020 Pilot Study report on SEP essentiality assessments concluded that “automated approaches will not be able to replace human efforts in the short or medium term.”
Garbage in, Garbage Out
AI models, at their very best, are limited by the quality and quantity of the data used to train them. Many AI applications can easily be fuelled with lots of accurate training data. For example, there are numerous pictures of cats and dogs: humans can accurately distinguish between them. The “digital twin” predictions of an aircraft in flight based on physics equations and mathematical models can be continuously recalibrated with accurate measurements of position, altitude, velocity, acceleration, temperature and airframe strain.
Other prospective AI applications are much more challenging and defective where training data is scant and inaccurate or has uncertain meaning. This is a severe shortcoming where AI is used to determine and count patents that are considered standard essential.
This AI training data needs to include many accurate determinations, including of patents found essential and patents found not essential. There is no such data set. The EC Pilot Study was significantly based on a non-random selection of 100 or so patents that had been thoroughly checked— some with the use of claim charts—most of which had been found essential.
The Alium patent pool for 4G and 5G in OpenRAN includes thousands of patents in its AI training data, but these were classified with only a cursory manual check.
Inaccuracy and Bias
My research shows that manual essentiality checks are inaccurate with widely varying results among essential checking studies. For example, various estimates for overall essentiality rates including all declarations vary from 50% to 8% in 4G and 5G. Essentiality rate rankings by patent owner are also wildly inconsistent. Assessors frequently disagree with each other in their essentiality determinations, even when thorough checks are made.
Courts have only ruled on questions of essentiality, infringement and validity for a very small proportion and number of declared patents. Possibly the best determinations in significant volumes are those undertaken for patent pools. But those are not representative samples of declared patents. Selections skew significantly to include patents that are found to be essential. Patent owners typically bear the assessment costs that can be €5,000 to €10,000 per patent, so they tend to submit only those declared patents with a relatively good chance of being found essential.
Maximizing accuracy in manual determinations requires the use of claim charts and thorough analysis than can take days per patent. That is prohibitively costly for more than a relatively small proportion of declared patents. Making only cursory checks — typically lasting less than 30 minutes per patent—on large numbers of patents is less accurate and results in significant systemic bias in favour of over-declaration. My research also shows that the lower the accuracy of individual determinations, the more systemically biased the results will be. While sampling mitigates high per-patent essentiality checking costs, it results in substantial sampling errors— with relatively wide confidence intervals in results — particularly where essentiality rates are low.
If checks are inaccurate and biased in training data, the results of AI-based determinations will also suffer from the same shortcomings. At least the random sampling errors—introduced by checking fewer patents so as to do each more thoroughly and accurately—are unbiased.
Inferencing from Subjective Decisions
Even with a hypothetically good training data set including large numbers of accurately checked patents, the AI algorithms have a much tougher task than in other applications. Essentiality determination is subjective. Even competent human experts doing a thorough job often disagree about their determinations on the same patents. Technical and legal interpretations of language may differ, as does the meaning of words in different contexts, or over the years as definitions and use of language changes. In contrast, definitions for position, altitude, velocity, acceleration, temperature and airframe strain remain unchanged in their consistent and accurate measurement.
Limitations in AI inferencing only make matters worse. AI techniques such as semantic text matching yield inferior results to human expert review. Semantic similarity and essentiality are not equivalent. Fine tuning AI results through RLHF methods could significantly affect the performance an AI-based essentiality assessment system, but this injects yet more subjective human judgment into the system.
Checking the Checks
The accuracy of results with sampling and AI depends on the extensive and in-depth manual checking upon which samples are based and AI is trained. The optimal blend of in-depth manual checking, sampling and any AI technique is an empirical question. If essentiality checking and patent counting is to be used in the determination of royalties, it is important that methodologies including various amounts of in-depth assessments, sampling and AI inferencing substantiate their precision and cost-effectiveness.
AI tools might help assessors be more productive—for example, in scouring technical specifications for potential essentiality or prior art (e.g. a technical specification that was published before a declared patent’s priority date)—but cannot replace human decision making.
Independent essentiality assessors and competing licensing platforms such as patent pools should remain at liberty to use whatever techniques they wish in their determinations. Market forces will decide their worthiness and incentivize improvements.
However, AI-based essentiality checking should not be imposed by government edict or on the unwilling. There is not the transparency and answerability that justice requires and to which parties in licensing are entitled.
Image Source: Deposit Photos
Image ID: 185517136
Join the Discussion
No comments yet. Add my comment.