What is Clotho and how did it come to be?
Clotho assists you in the performance of synthetic biology projects. Most importantly, Clotho is an App environment similar to an iPhone in the sense that anyone can create and share new tools. Using Apps, users can customize their local version of Clotho for a particular look and feel as well as the particular tasks done on their computer. These workflows can be anything including routine data entry and management, computer aided design, automated design and simulation, and sharing data. Because of a standardized data model, Apps written for Clotho intrinsically pass data seamlessly to one another allowing Apps that were never intended to work together to nevertheless function in concert. Moreover, the standardization of these objects and the enforcement of these standards help guarantee that a tool will not corrupt your data or reinterpret its meaning. So, regardless of whether you are using tools that look and operate like your familiar spreadsheets, or they are automation tools creating and manipulating thousands of objects, the tools all operate on the same data. They just present different ways of looking at the data or different functionalities to manipulate that data. Finally, standardization of the data allows the implementation of technical standards for biosafety to help you ascertain the biological risks of your designs. Clotho provides a solution to the existing problems in data management and design faced in research today while also providing an environment that will scale to support the needs of synthetic biology in the future as new technologies and capabilities become available.
Synthetic Biology is the next wave in genetic engineering in which biological systems are constructed ground-up to perform new and useful functions. Moreover, synthetic biology seeks to develop a theory-grounded engineering discipline for genetic engineering. However, the field is still in its early days, and experimentalists continue to use software and experimental tools that with few exceptions have remained unchanged in 30 years. Nevertheless, recent ideas in how to apply simulation and computer aided design, the use of grammars to allow computers to understand genetic composition, and the advent of standard assembly methods for automating DNA fabrication on the multi-gene scale have provided a new springboard for automation tools in this space. Nevertheless, these tools today sit as discrete islands of focused research inaccessible and unusable by most experimentalists. With an absence of comprehensive software toolsets for managing synthetic biology information, most experimentalists do one of several things. Many, particularly those who do high volume or use BioBrick assembly schemes will choose to store information about sequences in the form of either spreadsheets from Microsoft Excel or shared spreadsheets through Google Docs. These packages provide stable and surprisingly user friendly methods for maintaining, browsing, and searching this data. The difficulty is they allow too much flexibility to a user – there are endless ways of representing the data in these media, and to transfer data intrinsically requires a conversation with the source user. At the scale at which experimentation is currently performed, this is rarely a problem. Consider, though, what happens when you increase the volume of information tenfold. We have in recently years experienced this head on during our efforts to automate DNA assembly. We have developed robotic protocols for assembling DNAs using a BioBrick-like approach with >95% efficiency per junction. However, in practice the rate is much lower, and for no good reason—it invariably stems from errors human beings make in generating the commands being sent to the robot. Now, none of this should be a problem. The arraying and cherry picking of samples from source plates to reaction plates is a purely deterministic (and simple) problem yet just complicated enough that it’s hard to do by hand. Software can do this, but the type of “soft” data currently used by experimentalists does not directly allow this. If the data is not provided in an original format sufficiently granular for the software to comprehend, the software simply cannot be used. What this gets to is the heart of the problem we face in the automation of synthetic biology: it’s clear that all the activities we are doing are repetitive, can and should be automated. However, we are still usually working at a throughput in which it makes most sense for an individual to work on things by hand one-at-a-time. The solution to this contradiction, we feel, is to develop software that is useful in both modes—software that is sufficiently flexible that a user can be working in the mode of spreadsheets for comfort and convenience, but then can switch modes to complex behind-the-scenes algorithms that do the more complicated machine learning, automation, and high-throughput tasks that are the link to the future.
Clotho provides a solution to the need for a comprehensive software platform for synthetic biology tools. The first thing to understand about Clotho is that it doesn’t do anything. Clotho itself has no visual appearance, performs no useful functions, nor automates any tasks. It is simply a platform upon which other software can perform those tasks. Clotho is a data model-based tool and plugin environment. In developing the data model, we incorporated ideas generated in the POBOL and SBOL projects, the JBEIR registry, the Registry of Standard Biological Parts, Genocad, new ideas of DNA standard assembly methods, grammars, design calculators, and so forth being introduced in the synthetic biology community, as well as the “neglected” data objects collected in experimental labs that either are held in the brains of the experimentalists or tucked away in notebooks or assortments of wiki pages and Word files. We supersetted them all to make a data and plugin model in which it is possible to define any of these concepts or objects. Upon this data model sits a Java API that provides the link between tools and Clotho and between Apps and other Apps. The API provides methods for validating, creating, linking, saving, updating, and deleting the data objects. In the end, Apps sit on this API such that the underlying data is standardized and validated by the core. However, the user interface presented to a user is determined entirely by the app. This allows the data in Clotho ultimately to look like any type of tool. A Genocad-like pull down / icon view could be implemented as a means of designing composite parts. Alternatively, that same functionality could be presented through a Google doc-like spreadsheet. The general idea is, we standardize the data, but we allow the tools to be contorted to whatever best fits whatever workflow a user is most comfortable using. Furthermore, the act of creating a composite part might not involve a graphical user interface at all. One type of Clotho plugin, an algorithm, works behind the scenes but has all the same access to the API as a tool does. So, an algorithm could design its own composite parts, and furthermore, algorithms can be strung together to make advanced workflows that perhaps include simulations, design of experiment, the generation of documentation, and whatever else an app developer has gone to the trouble to write. Additionally, the entire functionality of the API can be accessed through scripting for the programming-savvy user. In the end, though, the composite part is a composite part regardless of which App generated it, and it can be shared by all apps and passed from user to user.
Just as Clotho remains agnostic about how a user wishes to interface with their data, the data model is agnostic about methods of DNA fabrication and lab practices. Though we fully support the BioBrick approach to describing composition, we also recognize that many users use a more free-form approach involving PCR methods or a battery of DNA modification enzymes. In developing the data model, we took into account all known methods for describing genetic composition and provide a solution that can be adapted to fit them all at the full level of granularity needed to describe them.
The power of this approach is that a community of researchers can develop software tools that independent of active collaboration nevertheless function synergistically. The availability of a mechanism for sharing these Apps improves the accessibility of users to these tools. You simply have to know how to install one App to enable the use of all Apps written for this platform. Additionally, by taking care of the core problems of data storage and validation, the task of creating a new BioCAD tool for a specific task is greatly simplified.
- Your data is private until you make it public
In Clotho, you choose and own the database backend. Therefore, your data is only visible to people with whom you've chosen to share it. Nevertheless, if you want to make your data available to the world, appropriate sharing Apps can be configured to translate your data into a form compatible with public tools such as the Registry of Standard Biological Parts. As long as there is a tool that can accept data in some standardized form, Clotho can be easily made to share the data with that tool.
- All your data is linked together
You're accustomed to keeping the sequences of your DNAs in a digital form. You may also be accustomed to using tools like Zotero to easily grab literature reference, hold them in a standardized form on your computer, and being able to export them as bibliographies into your papers. However, you're probably not used to seeing those two types of data linked together. When you go to publish, wouldn't it be nice to just say "Computer, give me all the papers associated with these constructs I made" and it would retrieve everything for you? Well, that is not core functionality of Clotho, but Clotho stores the references in a standardized form, and it Clotho stores the Parts in a standardized form, and it stores the linkages between Parts and the references, so the information is all available for your computer to perform a request like the one requested ...and many more such automated tasks. A tool or script simply needs to be added to run them.
All your data in Clotho is held in a standardized and "hard" form such that tools can understand the biology of what they encode, not just present them to you a string of text. All the data links together as a network: your experimental data links to the samples that the data was collected on, your Parts link to the Person who made that Part and further on to their Lab and Institution. This network of linkages can be navigated using automation Apps and also queried. Want to find all the Parts made by Suzy? Clotho can easily implement simple requests such as this by simply typing in "Parts by Suzy" into a text box.
- Workflows automate repetitive tasks
One of the powerful features of Clotho's core is the Algorithm framework. Algorithms are Apps just like the visible GUI-type tools that you're accustomed to working with. However, Algorithms don't look like anything. They just run in the background over and over. Where this comes in handy is when you need to do many repetitive tasks. Let's say you just did a big run of sequencing and have 1000 reads. Wouldn't it be nice to say "Computer, take this zip file, and for all the sequencing reads inside tell me whether they match their corresponding Part sequences, and finally create reports of what each one did and attach that data to each Sample?" That's an example of a Workflow, which is basically a pipeline of operations applied to one or more pieces of input data. Algorithms encode the modular units of these tasks, and workflows string together algorithms to make more complicated ensembles of operations. You can script these workflows on the fly within Clotho, and soon we'll provide Apps that allow you to "draw" your workflows on the screen. If you're going to use Clotho one Part-at-a-time this may not be particularly useful. However, once you start working with many things at a time, you'll find automation not only helpful but essential.
- Inception to disclosure support for your project
Many times in synthetic biology today, the organization of data is almost an afterthought. You read some papers, you pulled some sequences from Genbank, you put them into ApE and designed some oligos, then after all that you wrote something up in your notebook encapsulating the process. Or, you did some cloning in the lab, you put the cells into the plate reader and got out some data, you processed that data and finally put some charts into your notebook. Wouldn't it be better if your software tools were storing your information concurrently with your operations rather than after the fact? With Clotho, we've included every type of data we could think of that you are likely to encounter in synthetic biology. If we missed one, let us know and we'll add it in. What this allows you to do is store information about everything you do as you do it so that there is no need to think about it later. While you're doing some reading, the tool you're using can also be storing the literature reference. If there is some protein coding sequence related to that reference, your tools can put that "Feature" into Clotho when you first saw it and link it to the literature reference. Later when you go to make a Part containing that Feature, the act of annotating the Part with the Feature automatically links it to that paper you got it from. When you use a robotic automation tool that builds the DNA for you, the Plasmids and Sample generated during that process are automatically put into Clotho and linked to the Parts, Features, and references related to them. Essentially, Clotho's standardized and linked datamodel allows the development of a new class of research tools that makes the documentation not like an act of publication, but rather an integrated step in the act of performing an operation. What does all this mean to you? Well, less work trying to organize your information manually, and 6 months after the fact you'll have more data to remind you of what exactly you did.
Clotho’s beginnings come from the realization that ideas from electronic design automation, in particular platform-based design, could apply to the design of synthetic biological systems. In addition, it became apparent that a data model which reflects the needs of synthetic biology and tools which took advantage of this data model needed to be developed in an open source, collaborative environment. In particular, Clotho has always focused on separating function from implementation (what something does from how it is physically made) and data from applications (information from the use of that information). In 2007 Clotho began as a “BioCAD” prototype tool developed at UC Berkeley (lead by Douglas Densmore) based on discussions with researchers at UCSF (in particular Chris Voigt) and entered active development as part of UC Berkeley’s iGEM (International Genetically Engineered Machine’s competition) program. Those discussions planted the seed for UC Berkeley’s 2008 award winning software tools team. 2009 built upon that momentum with a second win for “best software tool” at iGEM. These successes paved the way for the Clotho of today. Clotho now represents an interdisciplinary, multi-university research effort.
This website was designed and developed by Noble Studios, an international web development and digital marketing company. Noble Studios was introduced to this groundbreaking project by Autodesk, a leader in 3D design, engineering, and entertainment software. Noble Studios also created a custom logo for The Clotho Project. To learn more about Noble Studios, visit www.noblestudios.com.
Clotho's Core includes several open source, 3rd Party libraries, and we thankfully acknowledge the authors for their contribution. These include the Netbeans 6.9 plugin environment, the MediaWiki parser Textile-J, the file drop handler FileDrop, JGraph, and NCBI's BLAST java libraries. Clotho's Biosafety management employs libraries from the VFDB Virulence Factor database.