New Amazon Software Patent, Shakespeare & © Infringement

William Shakespeare

William Shakespeare

Earlier this week, on October 27, 2009, Amazon Technologies, Inc., received US Patent No. 7,610,382, which relates to a computer implemented method of marking copies of content distributed on a network. More specifically, the patent discloses and claims a variety of embodiments of a method and associated apparatus for programmatically substituting synonyms into text content distributed through a Web service. Embodiments include having a synonym substitution mechanism that will replace selected words in text with synonyms for those selected words, such as by substituting the synonyms in excerpts of copyrighted works that are provided to via a Web service interface. Tip of the hat to Slashdot for finding this patent and bringing it out into the open, but the major thrust of the patent and its potential importance was unfortunately downplayed. The submitter did recognize that in one version of the invention the method can be used to identify and call out copyright infringers, but then snidely joked about a minor aspect of the patent by saying “anti-piracy measures should trump kids’ ability to spell correctly, shouldn’t they?” Perhaps it is to much to ask for the masses to take any software related patent seriously, but there is no doubt this is an innovation and a good example running up to the Bilski Supreme Court arguments why software innovations should be patentable if they satisfy the other patentability requirements; namely if they are new and non-obvious.

On Slashdot the write-up of this patent starts: “To exist or not to exist: that is the query.” The submitter explains that is what Hamlet’s famous soliloquy might appear to future generations thanks to this recently issued Amazon patent.  The tone and follow-up comments seem to suggest, although it is admittedly not clear, that this is not a good thing.  Beauty is in the eye of the beholder, and while many believe Shakespeare’s language is nearly sacred others simply don’t understand what he is saying or trying to convey.  This should not come as a shock to anyone, at least not to anyone who has stopped and thought about the matter for any length of time.

Shakespeare, the author of numerous classic works and perhaps the most revered playwright of all time, hailed from England and lived from 1564 to 1616. See Shakespeare Timeline. It is hardly news that the language spoken by Shakespeare and his contemporaries does not pass for what we know as the English language today, which means that the overwhelming majority of people do not find Shakespeare approachable, readable or enjoyable today.  While it may be a sad commentary (to some) that this is the case, this truth is hardly controversial or debatable.  To the extent that Amazon has come up with computerized way to automate the swapping of synonyms in text it seems self evident that there are enormous potential implications, not the least of which is the translation of writings from one language to another, or the updating of the great works of the world so that they can be read, enjoyed and understood by the masses.  Hardly something to poke fun at in my opinion.

Notwithstanding the potential social importance of an invention of this sort, much of the patent describes the use of the invention to combat copyright infringement, which we all know is rampant on the Internet, and largely unstoppable given the advent of electronic media.  To this end the patent explains:

By replacing one or more selected words in an excerpt from a textual work with synonyms for the words, illicit copies of the excerpt may be recognized by comparing a copy of the excerpt to the original. In one embodiment, particular permutations of synonym substitutions may be provided in excerpts to particular clients. The particular permutation provided to a particular client may be recorded and used to determine the client as the source of an illicit copy of the excerpt. In addition, a Web search for phrases from an excerpt that include one or more synonym substitutions may be used to detect illicit copies of an excerpt that was provided to a particular client via the Web service. Permutations of synonym substitutions may take different forms. For example, in one embodiment, a set of selected words in an excerpt may each have two or more candidate synonyms. Copies of an excerpt provided to two or more clients may each include unique permutations of the candidate synonyms for the selected words. Alternatively, synonym substitution may be performed for different sets of words in copies of an excerpt provided to multiple clients. In another embodiment, a combination of the two permutation methods may be used. These unique permutations may be recorded along with information on the particular clients that the excerpts were provided to, allowing copies of the excerpt to be traced to the particular clients.

The patent then goes on to explain:

Replacing one or more selected words in excerpts from a textual work with synonyms for the words may also be effective in preventing, or at least making difficult, programmatic “excerpt chaining.” A client may possibly use a Web service to obtain multiple excerpts from a copyrighted work. These excerpts potentially overlap, with adjacent excerpts from the original work including overlapping portions. Excerpt chaining may be programmatically performed to link the overlapping excerpts and to thus generate larger portions of the copyrighted work, possibly even the entire work. In excerpt chaining, starting and ending phrases of excerpts from a textual work are compared to reconstruct larger portions of the work. In one embodiment, synonym substitution may make programmatic excerpt chaining difficult if not impossible by substituting different synonyms for the same word(s) and/or by substituting synonyms for one or more different words in an overlapping portion of two adjacent excerpts.

Should an entity be successful in chaining two or more excerpts to generate a larger portion of the work, or even the entire work, from which the excerpts come, the presence of synonym substitutions in the chained excerpts may allow the illicit copy of the work to be identified and possibly even traced back to the client that originally obtained the excerpts via the Web services interface.

Undoubtedly, this patent will be vilified by many who claim to be software or computer programming experts, who will whine that they could have done such a thing, which was well known.  Of course, the critical aspect that they leave out is that they could have attempted to accomplished such a thing if they had given it any thought.  Historically “obvious to try” arguments have not rendered inventions obvious, thankfully.  The undeniable truth is that if “obvious to try” is the standard then virtually nothing would ever be patented in the future.  I know that would make many folks happy, but it would hardly be good for innovation and it would destroy the economy.  If “obvious to try” is the law then anyone who sets out to accomplish a task could not be an inventor, and only those who mistakenly or erroneously stumbled on to something could obtain patents.  That is hardly something the law should encourage or even tolerate, but sadly there is support for such a naive view of the world in the US Supreme Court KSR v. Teleflex.  Thankfully the US Patent Office and the court have not followed that lunacy for the most part, at least not yet.  See, for example, Another KSR Retrospective.

The thing that the majority of judges on the Federal Circuit do not seem to understand, which is the thing that your average computer programmer likely will also never understand, is that computer software is far from obvious and the operation of computer code is hardly predictable.  This truth, which will be denied by many despite overwhelming evidence to the contrary, is proven every day when computers operate in peculiar ways.  A computer should respond the same way every time to the same set of inputs, but we all know that does not happen.  We have all experienced computer crashes, frozen computers, peculiar occurrences and the like, which do not happen all the time and sometimes cannot even be replicated.  Computer code presents instructions and machines should operate the same every time without deviation, but they don’t.  That is because there is a human element involved, and user error, human error and unexplainable and unforeseen incompatibility issues often throw a monkey wrench into the process.  For more see Software Engineering != Computer Science.

Whatever engineer knows is that math and science get you only so far.  Engineering is said to be the science of the practical, or the practical application of science in the real world.  The reason for this is simply because things do not always operate the way they should, or the way it is anticipated.  The Pontiac Quad 4 engine is a perfect example.  It was hailed as one of the greatest designs on paper.  Oh… if it could only have run on paper!  In practice it was a disaster.  I had a Grand Am with the Quad 4 and managed to get 112,000+ miles.  I was told by many that was miraculous because the engine pretty much crapped out in all cases no later than 70,000 miles, and usually before 60,000 miles.  The reality is that what is expected or obvious on paper is frequently, perhaps even usually, not what happens in real life.

On some level the fact that any software works is indicative of an innovative step, or at the very least an engineering marvel.  The fact that user error doesn’t derail all software has to suggest true innovation.  And the ability to create software that actually reliably works on a Microsoft machine… well… that is damn near miraculous!


Warning & Disclaimer: The pages, articles and comments on do not constitute legal advice, nor do they create any attorney-client relationship. The articles published express the personal opinion and views of the author as of the time of publication and should not be attributed to the author’s employer, clients or the sponsors of

Join the Discussion

3 comments so far.

  • [Avatar for Dave]
    February 4, 2010 04:31 pm

    The idea of creating trackable small changes in each copy of a distributed document, with the use of a unique combination of synonyms or puncutation in each copy, is not new. This has long been done with top secret documents to provide a means of identifying the leaker. In the event of a leak, if the leaked document was recovered or reproduced the unique clues embedded in the document could be used to find out who the document had originally been issued to.

    Whether or not doing this process through an automated system is unique is…well, up to the patent examiner to determine. But the general concept is not unique and has at least been described in a number of spy novels.

  • [Avatar for Pissed off Programmer]
    Pissed off Programmer
    October 31, 2009 11:13 am


    The beautiful thing about math is that it does work the same in the computer as it does on paper, excluding power failures and other such nonsense. Physics on the other hand has a lot more variability to it which goes into things like engines. I agree with the precept that computer science does not equal software engineering, but people aren’t patenting contingency planning, which is the art in software engineering. We have these great things in math called proofs. Most software “engineers” don’t know very much about the mathematical foundations they work with and their code and algorithms are very poor. So I suppose they might not get a working program together that doesn’t have a bunch of bugs in it, but it’s not like the majority of these patents contain any real technical information to alleviate this problem either.

    I read all the claims for this patent, or rather skimmed since a lot of it was duplicate text; you can really tell the parts where they include buzz phrases to ensure acceptance. I find claim 20 to be rather odd since all textual material is copyrighted, if the author so wished to fight for it. I guess they mean registered copyrighted works?

    Web service infrastructure, relational databases, sophisticated search and replace algorithms, these are nothing new. The fact that nobody has put together this particular combination yet doesn’t say anything about it except that there has been no demand. I didn’t read a single technical sentence in that entire patent that would help me implement this thing if I didn’t know how already, and I could, not just try. To me it just reads like an idea patent on a particular combination of different already existing technologies. If they were the first people to devise an algorithm that could sophisticated search and replace words in a text file or stream then I would be more impressed.

    I will say this though, for all the attacks I have made against you I do agree that the future lies in a marriage of open source with proprietary components. I think the future is in modularized programs that use open source libraries to do the work merged with professional design, testing, support, and end user content.

  • [Avatar for Adam]
    October 30, 2009 03:45 pm

    The basic idea of this patent, replacing text with synonyms on the fly for a web page, has been done for at least a decade in forum software with “swear filter” features. They add the concept of doing it for the purpose of digital watermarking, instead of for protecting the sensibilities of readers, which is an interesting idea, though one likely distasteful to both authors and readers.

    I have a legal question, though: to what extent are motivations important in determining whether someone infringes on the patent? Claim 25, for example, only seems to cover locating unauthorized copies of the data. But what if you used all of the methods in the patent to locate authorized copies of the data? The only difference is the reason you want to practice the methods. Would you still be infringing the patent?