A new machine-learning based pipeline that could help phenotype plants en masse, might just break a major bottleneck in the process of understanding what plant genes really do.
Arabidopsis, the most well studied plant on the planet, has approximately 27,000 genes – and we still don’t really understand what many of them do.
But we’re trying to find out.
In the past years and decades, molecular biology has come forward in leaps and bounds. We now have a variety of techniques that let us minutely alter genes, add them in or take them out, and even turn them on and off at will using minute molecular switches. So to understand what genes do, we change them or break them or switch them on and off, and then track how that changes or breaks the plant.
Tracking those changes, at the molecular level, is also something that we’ve become particularly good at.
We have techniques to see how transcript numbers swell and subside (RNAseq), how proteins accumulate and how they work (proteomics, enzyme profiling), and how their molecular products are created and destroyed (metabolomics, lipidomics).
In the past years, the methods to measure these molecular responses have become more powerful, more precise, and infinitely cheaper.
But even though we can now more easily understand what a gene does at the molecular level, the ultimate measure of its use and quality, involves the impacts that genes have on the survival, growth and reproduction of the plant.
And to understand that, we tend to have to look at the plant as a whole.
But, perhaps surprisingly, ‘looking’ at the plant – really observing it and seeing how the genetic changes affect the plant’s observable traits, its phenotype, is something that we’re still not that great at. At least not when it comes to doing it in a high throughput way.
Growing the plants for observation takes space and time, even with a fairly small and fast growing plant like Arabidopsis. Ideally, you would grow the plant under several or many different environmental conditions, as the phenotype of a plant is ultimately an interaction of the genotype (the genes) and the environment. Meaning more space and/or time. The type of environment – warm, cold, high light, low light etc. – can also be a major clue to elucidate gene function. For example, plants missing a gene that’s involved in protection from excess light might result in a normal looking plant under low-light growth conditions, but turn up with something much closer to death under brighter suns. And the plants should be observed throughout their entire life, as some genes are only required at certain developmental stages.
So if you have the resources to grow the plants, the next bottleneck becomes the observation itself. Not just looking at them, but observing them in a measurable – and hopefully objective – way. How big the plants are, how fast they grow, how green they are.
Our beloved Arabidopsis, before it reaches the reproductive stage and hoists up its long floral stem, exists as a largely-two dimensional rosette. So photographing the plants from above can sufficiently capture size and structure. In fact, although it requires some dedicated equipment, high throughput acquisition of images of growing Arabidopsis plants has become almost trivial today.
But robustly and faithfully analysing that data is not so easy.
To solve this problem, Patrick Hüther, Claude Beckera and colleagues took advantage of a well well-established deep learning model (DepLabV31), and retrained it to accurately recognise plant parts using annotated photos. They named their approach AraDEEPopsis, for ARAbidopsis DEEP-learning-based OPtimal Semantic Image Segregation.
Try saying that ten times fast.
This first step of development – the recognition of plant parts- is itself a large part of the data analysis problem. Plant photos include not just the plant but also the soil they’re growing on, and finding the plant within its potty background isn’t always straightforward. Traditionally, image analysis software have simply searched for parts of the image (pixels) that are considered green enough, and use this binary threshold (green enough vs not green enough) to define if an area is plant or not plant.
Perhaps unsurprisingly, this method is less than foolproof. While plants are traditionally ‘quite green’, they can also accumulate anthocyanins – protective pigments that turn them a deep red or purple. Under other conditions, sick leaves can end up being paler and yellower than expected, while old (senescing) leaves end their life in a brown or grey.
These features can make it difficult for an automated system to accurately identify Arabidopsis. But not so for AraDEEPopsis: following training on a relatively small number of plant photos, that the scientists annotated carefully by hand, the deep learning model was able to robustly recognise plant parts regardless of their hues.
In fact, after training their model first to recognise the entire plant rosette, the scientists then worked on training it to understand which parts of the plant were green, which were particularly pink (i.e., filled with anthocyanins), and which were brown (aging or dead). Meaning that AraDEEPopsis was not only able to see whole plants where other methods couldn’t when parts of them looked different, but it could also ‘see’ and quantify those types of differences.
As proof of AraDEEPopsis’s power, the scientists demonstrated that their model could help accurately identify plants that had variations in genes involved in anthocyanin accumulation, as well as those involved in cell death and senescence. Additionally, they tested AraDEEPopsis on other plant species, including Arabidopsis’ cousin and our favourite new oilseed plant – pennycress (Thlaspi arvense), as well as on Nicotiana tabacum (tobacco). The model worked well for both.
Overall, AraDeepopsis is a reliable tool for image analysis, which is freely available, requires limited user input and can be run from a PC. The scientists ultimately intend to extend its capabilities by training the model to recognise other phenotypes. But for now, it’s a great start to ramp up automatic phenotyping, overcoming the plant phenotyping bottleneck, and helping us to more rapidly understand what different genes do, and how they make plants behave.
References
ARADEEPOPSIS, an Automated Workflow for Top-View Plant Phenomics using Semantic Segmentation of Leaf States. Patrick Hüther, Niklas Schandry, Katharina Jandrasits, Ilja Bezrukov, Claude Becker The Plant Cell Dec 2020, 32 (12) 3674-3688; DOI: 10.1105/tpc.20.00318