If you think about the COVID vaccines developed over the last couple of years, you might wonder whether we’ve already cracked the code of having fast tech transfers. Yes and no. It was done very quickly because of the sheer amount of human power and perseverance.
In fact, there are numerous public articles about how many people Pfizer dedicated to driving the tech transfer for the COVID-19 vaccine, which by the way is a 50,000 process step model or unit operation model. Try describing that in a piece of paper on PDF. It’s almost impossible to convey the full recipe.
The point is: Yes, it’s possible; but it’s not sustainable. If we have another pandemic, could we throw 300 people at tech transfer? Absolutely. But does that work for a commercial product? No. We need to think about the fact there are technologies available that can remove as much of the human element from tech transfer as possible.
Digging into Data
Most of the data collected during the tech transfer process is data that, ironically, may have started in a digital fashion. But when you start pulling together the compendium of information that may come from multiple systems, it gets translated — or let’s say repurposed — into a PDF file or what we call paper on glass.
When you hand that to manufacturing operations, they need to reverse engineer the information to feed the enterprise resource planning (ERP) or manufacturing execution system (MES). When we looked at that challenge with our clients, we thought there had to be a better way to grab that data — digital or not — and convert it to a consistent format that can drive the digital thread.
But what kind of data are we dealing with? If you start at the highest level in Figure 1, there’s the actual material itself, whether it’s a drug substance, the active pharmaceutical ingredient, the intermediates and so on. Then we divide that by small molecule or large molecule because the constructs are a little bit different in terms of describing the actual materials that make up a drug product.
The middle lane of the figure is really about packaging. If it’s intended for an oral solid dosage form, it could be in a blister pack. If it’s a biologic that has a targeted drug delivery device, it may have a very well-defined drug delivery device, bill of material, perhaps even with connected components measuring the regimen and the dosing therapy and so on. That could be its own initiative.
Typically what we see is that these groups have very different domain information. A packaging development engineer can’t develop a drug substance and vice versa. They’re very specialized and concentrated in their areas, and they tend to go fairly deep in terms of leveraging the systems they use, which may not necessarily be that valuable to a scientist, for example. So if I’m a packaging engineer working on a 3D modeling application to develop packaging, that means nothing to a scientist who’s working on the key molecule.
The last two pieces are really about development. Think about process engineering. You’re trying to take 10 to 15 years of development efforts, whether it was done in the lab or in a pilot plant, and you’re trying to scale up to a commercial level. Then you’re also trying to address the complexity that exists across different market authorizations you may receive.
In essence, one single drug product may end up having 50 or 60 recipes behind it, depending on the number of sites it’s being manufactured at and the number of market variations you support.
Last is the equipment layer itself. We know most pharma and biotech companies don’t have exact cloned replicas of plants. There’s variability. They might use a mixer or a bioreactor in one location that behaves and operates a little bit differently than another location. That needs to be taken into account when doing the tech transfer or scale-up.
That’s the complexity we’re trying to address through these tech transfer documents.
We’ve been doing the same work for the last 100 years in terms of how we define recipes. With the computing power we have available now, there are a couple of different pathways we can use to speed up the process of data collection and data dissemination to downstream systems.
What’s really missing is a mechanism or process that takes all of the development data collected over 10 to 15 years and converts it into a reusable format that can readily be consumed and leveraged by an ERP or an MES. The goal is to improve the current approach of having 10 to 15 people per site who work on understanding the documentation delivered to them from their development counterparts.
What makes it even more interesting and probably more work is if you’re a contract development and manufacturing company (CDMO). If you need to deal with multiple pharma innovators providing data in different packages, you probably have a process development group that spends six to 18 months just to understand what the intent was, provide it back to the pharma innovator and say, “Is this what you meant?”
We need a Google Translate. This app you can get on your phone doesn’t just convert words from one language to another. It also examines the semantics and the grammatical context. If it just translates individual words, maybe it gets 30% of the way there. It really needs to understand the intent of the phrase typed in.
That’s what we’re driving toward: Looking at paper-on-glass documents for those word documents or scanned images and trying to infer some digital sense from it.
Figure 2 illustrates the current tech transfer process that should be familiar to those of us in the life sciences industry. On the left-hand side, you have all the systems that contribute to the definition of the product, the packaging and the process. All of that needs to be funneled through a firewall and received either by an internal manufacturing group or multiple contract manufacturers that need to interpret their relevant piece of information.
Our mission is to enable the conversion of these paper-on-glass or image-based documents into something structured and repeatable we can consistently provide to downstream systems and remove the human element of interpreting the intent of that document. Then the data can be leveraged by all downstream partners.
How We Do It
Looking at the process for accomplishing this mission, first we need to ensure the data is securely submitted. Many companies communicate via FTP, email, phone calls and websites, and it becomes very difficult from a control strategy perspective to secure intellectual property (IP). At the end of the day, everything that flows through tech transfer is your company’s IP that must be secured and made available to the right parties at the right time.
It’s not just about the conversion of the data; it’s also about tracking that data. You need an audit trail in case of an adverse event so you’re able to understand exactly what was converted, who approved it, who signed off on it, who received the data and who consumed the data.
Somebody is responsible for collecting all the information from all the different systems being uses by the scientists and the process development engineers. Then they need to aggregate it into a single document or maybe a compendium of documents, and orchestrate the process of delivering the data to their manufacturing organization.
What’s missing is that orchestration and conversion mechanism like Google Translate that understands the true intent of what you’re trying to communicate through tech transfer and turns it into something predictable and leverageable.
The idea is that once it parses the data into an understandable, reusable format, the downstream systems won’t require humans to type in all the information. Instead, the information is automatically pushed to the system that needs it.
Our intent is to use a natural language processing mechanism that understands the documents — the context of the words, the semantics, the grammatical intent — and has a machine-learning algorithm that can understand the intent of each document and convert it to an ISA 88 structured format. Essentially, it takes the documents and stitches them together with digital data to come up with a reusable digital data construct your systems can readily ingest.
But tech transfer documents don’t just have digital or textual data formatted in tables or in hierarchies. They also have image data. There might be chromatography analysis. There might be sampling methods and testing methods. These are unstructured data sets you can’t easily convert to digital data. But they’re related to some level of digital data, so you need to be able to understand the inherent differences across different datasets that might be buried in that document.
When you run the document through the natural language processing tool, it can take scanned images and use optical character recognition (OCR) technologies to extract data. Or, if it happens to have originated as digital data that got captured in a PDF document, it can pull the data back out again.
At this point, there’s no context behind this data. The tool has simply extracted the data and said, “I understand the volume of data that exists in this document.” The natural language processing output looks for key indicators so we can create tabular data sets that can be readily imported or ingested by the downstream system.
One of the benefits of this approach is the collaboration it enables. It’s very hard to collaborate with someone on a PDF document if a value is misread. How do you convey that? You send an email and say, “Hey, on page 22, paragraph three, line four has a value I can’t read.” If you’re able to extract that, the intelligence layer can tell you what’s missing or is able to highlight the pieces you should pay attention to so you can make the process much more efficient.
Choose a Path
There are two directions through this. One is to continue doing what you’re doing today because you understand that. In the life sciences industry, it’s very hard to drive change. So you can continue working with development organizations and have them produce the same PDFs they’ve produced for years, and then use a natural language processing layer to convert it into something digital and reusable and legible. That’s one pathway.
The second pathway is to adopt digital native tools that allow you to model the process and materials very early in the development process and natively publish the digital data sets. We’re realistic in that we know it will take years — if not decades — for certain corners of the life sciences industry to adopt native digital solutions.
In the interim, we’re promoting this two-pathway approach: Start by using the computing power of AI and machine learning to convert documents into something that’s reusable, and then over time adopt digital native tools. The biggest benefit is pure labor efficiency, but it goes beyond that to:
Improved speed to clinical trials, market and market authorizations (variations or flavors)
Reduced overall cost of internal and external transfers to manufacturing
Increased speed and efficiency of process validation
Reduced latency of facility, line and equipment provisioning/start-up
Improved batch quality, and reduced scrap and waste
Improved speed of regulatory submission and approvals
Improved closed loop quality by design from development to manufacturing to regulatory
Improved traceability into batch genealogy (right country, right product)
Sachin lives in Austin, Texas with his wife Michelle and their 2 dogs. He enjoys cooking Indian and other Asian cuisines, enjoys golf, and is an active industrial designer for charity and open source. He is a technology enthusiast and builds various devices in his spare time utilizing existing and emerging technologies such as IoT, Arduino, Particle and additive/subtractive manufacturing. Some of his designs can be found at http://thingiverse.com/Sachin/designs