In the past, being able to download open-source code or clone it and then subsequently modify it, encouraged development of software without authorization. Often, developers and organizations mentioned the copyright of the open source, at the most. The obligation of adhering to GNU/GPL, Creative Commons, FreeBSD, MIT licenses and the like, is often an afterthought and leads to exposure due to penalization due to usage of the software and modification for commercial reasons.
Organizations today want to ensure copyright or intellectual property of their source code, as it is labor-intensive, and something their coders take pride in.
Establishing identification of ownership of source code by a developer and a company can be done through a technique called source code Watermarking. Let’s understand how to leverage it for copyright and intellectual property rights.
How it works
Source code watermarking basically consists of embedding a unique identifier, aka, a watermark within the source code, to prove the author’s original ownership and prevents/ enables a deterrent for copyright violation. Hiding essential information that uniquely identifies the author/owner, in such a way that cannot be detected easily is where watermarking needs to be understood, both for implementation and relevance in business.
Characteristics of Watermarking
There are principally four characteristics of source code watermarking. Not all of them can be achieved together, rather can be on specific application and domain understanding use cases:
In source code, there are public ownerships (like the ones in GitHub, etc.), authorizations and permissions to use the code and more importantly unauthorized uses means that the watermark should persist in compiled code, byte-code and even persist reverse engineering be it intentionally or unintentionally.
The author’s information/ copyright watermark must remain hidden from detection that is unauthorized. The outcomes of the watermarking process — often referred to as ‘payload’ too should remain hidden.
The watermark should always render the outcome in an unobtrusive manner, in that it should not be noticeable while comparing the outcome. The watermark should not cause the outcome to be noticeably changed.
This is characterized by the number of bits encoded in a given time period. The thumb-rule being the higher the better.
Techniques of watermarking
Static watermarking — such as the ones that typically developers put as ‘dead-code’ or comments specifying author, date, ownership and references to licenses and others. Another way that static watermarking can exist in a C code.
Dynamic watermarking — is another technique where the watermark is hidden within the source code, extracted only during runtime. Depending on the complexity required, these can be layered within the User Experience layer, Application layer or the Data/Persistence layer.
Benefitting developers and organizations alike
Source code watermarking thus becomes useful to ensure protection of the author’s copyright and tamper-proofing, maintaining the integrity of the code. However, currently very few organizations and developers adopt this as there are overheads to incorporate the extension and ensure full pass through testing. It is needed in today’s competitive world, where product innovation and disruption along with open-source collaboration form the pivot of an organization’s strategy.
For developers, a sense of ownership makes it easy to adopt. For organizations, preventing intellectual property and copyright violations through incorporating marginal overheads in supplementary controls makes it a no-brainer.
Sr. Principal, Enterprise Architecture, LTI
As an Enterprise Architect at LTI, Shiva is responsible for Architecture and Advisory services for clients. For LTI, Shiva has led several Architecture Assessments and Application Modernization engagements and co-authoring the LTI Enterprise Architecture Framework. Shiva has 24+ years of relevant industry experience, with global clients across Banking & Financial Services, Defense, Education, E-Governance, Healthcare & niche areas such as Chemo-informatics domains.