Boosting FTO Search with Degenerate Sequence Searching
Biological sequences form the bedrock of innovation in biotechnology, with countless advancements revolving around these sequences. However, the unique nature of biological sequences poses a challenge for conventional keyword-based information retrieval methods, often leading to the oversight of crucial information and potential risks.
The sequences presented in patent claims encompass a wide range of variations, not only describing the sequences themselves but also requiring a specific level of homology. As a result, researchers heavily rely on homology sequence alignment algorithms to explore sequence databases, using predefined homology thresholds to ensure comprehensive results. This approach is widely employed in current biological sequence database searches.
Nevertheless, a pressing question remains: can these similar sequence searches genuinely identify all potential target sequences? While these methods have proven effective, their ability to capture every relevant sequence warrants further examination. It is crucial to explore the limitations of current search methodologies and strive for enhanced approaches that leave no potential target sequence undiscovered.
Special Sequences in Patents
Combining similar sequence searches with keyword based results aggregation significantly reduces the risk of overlooking crucial information and FTO issues.
However, sequences in patents differ from those found in other biological databases as they exhibit many “patent-specific” characteristics. To expand the scope of patent protection and create search barriers for competitors, patent drafters often employ a description method similar to the “Markush structure” used in chemistry. By introducing degenerate symbols, wildcards, operators, and other information between positions in the parent sequence, and describing the specific parameters of these symbols through explanatory documents, we refer to them as “Degenerate Sequences.”
The image below illustrates a degenerate sequence described in patent claims:
Degenerate sequences themselves do not possess any biological significance; they solely serve the purpose of the patent. However, when combined with the description of the homology range, such an approach not only comprehensively protects innovative achievements but also becomes a “decisive blow” against the current conventional sequence homology search methods. Let’s take a look at an example below.
Query sequence:
“EVGSYPAPSDACPSDYFYCDASGRSAGGGGTENLYFQGSGGS”
Target sequence:
“EVGSYXXXXXXCXXXXXXCXXSGRSAGGGG TENLYFQGSG GS”
The similarity score obtained from the BLAST algorithm is only 67%, but the actual similarity is 100%.
This happens because conventional sequence homology alignment algorithms do not consider scenarios involving degenerate sequences during their initial development. Therefore, without special processing, excluding degenerate sequences would lead to two situations when using conventional algorithms:
1) Inability to search for the sequence
2) Exclusion of sequences due to similarity scores falling below the threshold.
Both scenarios pose significant challenges for sequence searchers, as they not only impede the comparison of sequences with patent claims but also increase the likelihood of overlooking critical sequence information.
Patsnap’s Solution
Patsnap’s biological sequence database (Bio) statistics show that the occurrence of such special sequences in global patent literature is not insignificant. There are approximately 7.4 million nucleotide sequences, accounting for 7.12% of the total number of nucleotides, and 1.31 million protein sequences, accounting for 7.55%. This indicates a significant number of generic sequences that can affect search results due to the presence of special symbols, posing substantial risks for FTO analyses.
Therefore, to mitigate the risk of overlooking these critical sequences, Patsnap’s Algorithm Engineering Team has developed a deep learning model using in-house NLP, CV, entity recognition, and coreference resolution technologies.
This model is designed to identify and parse degenerate sequences and their substitutions in sequence listings and full-text patents, and it established a Degenerate Sequence Searching Database as part of our Bio Professional package.
Using a specialized sequence alignment algorithm, this database not only enables the retrieval of such sequences but also provides a true similarity score. Therefore, by performing searches within the degenerate sequence database, we can effectively mitigate the risk of inadvertently overlooking crucial information during freedom to operate (FTO) and novelty searches.
Given the potential scale of variations in degenerate sequences, which can reach the tens of billions, traditional sequence alignment algorithms fail to meet the real-time retrieval demands. Patsnap tackles this challenge by employing a deeply customized sequence alignment algorithm that dynamically loads substitution information for degenerate sequences during the retrieval process, ensuring precise retrieval within reasonable time frames.
During the scanning phase, Patsnap introduces a compression algorithm to construct a seed word table for heuristic searches, significantly reducing unnecessary comparisons and improving retrieval efficiency. When aligning query sequences with target sequences, Patsnap’s proprietary algorithm incorporates degenerate substitution information, resulting in more accurate alignment and query results, as well as more intuitive and visually appealing alignment outcomes for different variants of the query sequence and target sequence.
Experience Degenerate Sequence Searching Now
In June of 2023, Patsnap’s biological sequence Bio database introduced a powerful degenerate sequence search feature, causing a paradigm shift in the patent domain. This disruptive advancement provides researchers with an immensely robust tool that offers an extensive collection of degenerate sequences, allowing users to effortlessly obtain the most accurate and relevant information in their searches.
To schedule a demo or learn more, visit patsnap.com/solutions/bio.
About Patsnap: Founded in 2007, Patsnap is the company behind the world’s leading AI-powered innovation intelligence platform. Patsnap provides global businesses with a connected, easy-to-use platform that helps them make better decisions in the innovation process. Customers are innovators across multiple industry sectors, including agriculture and chemicals, consumer goods, food and beverage, life sciences, automotive, oil and gas, professional services, aviation and aerospace, and education.
Media Contact:
Antasha Durbin
Email: [email protected]
Your recommended content
-
Patsnap Surpasses US$100 Million in Annual Recurring Revenue
Category: Article | Category: News/PR
Wednesday, June 12, 2024
Patsnap has reached a significant milestone of achieving $100M in Annual Recurring Revenue (ARR), marking an impressive 20% year-over-year growth in 2023. This milestone highlights the massive and meaningful value our platform brings to over 12,000 IP and R&D teams across 50 countries, driving efficiency, productivity, and collaboration.
-
Transforming Patent Intelligence: Patsnap Partners with BizInt Smart Charts
Category: News/PR | Category: Partnership
Thursday, February 1, 2024
In a significant development for patent professionals worldwide, Patsnap has collaborated with BizInt Smart Charts to empower users with streamlined data integration. This strategic partnership facilitates the direct export of Patsnap data into BizInt Smart Charts for Patents, simplifying the transition from patent search to comprehensive analysis.
-
Patsnap’s Analytics Launches Family Mode Searching
Friday, December 8, 2023
Introducing Family Search Mode! Explore patent innovation like never before with Patsnap's Analytics, treating each patent family as a data unit for enhanced insights into interconnected technologies