Mihaela Pertea joined the Johns Hopkins Department of Biomedical Engineering as an associate professor in July 2019. In this interview, she discusses her research in computational genomics, her career journey, and advice for students.
What inspired you to pursue a career in computational genomics?
Originally, I came to Hopkins [as a graduate student] thinking that I would work on artificial intelligence, because that was all that I had been exposed to as an undergrad. Then my thesis mentor, Steven Salzberg, introduced me to computational biology and I loved it. Coming here from Romania, it was amazing to me that people were doing hands-on research. There, it is much more theoretical. I was really impressed.
Why did you choose to join Johns Hopkins BME?
Hopkins is great. It was my first introduction to the area, and of course I like it here. After my PhD, I went to The Institute for Genomic Research, TIGR, in Rockville, MD. That is where I developed my first gene finder, for the malaria genome. It was an exciting project, because malaria was and is still a major cause of disease, and millions of people are dying every year because of it. We wanted to look into the genes that cause malaria but had no gene finder to do it. That is how it started. I always wanted to focus more on medical applications, and I thought that the wealth of data here at Hopkins was so great, and really something that I could task my tools on. When you interact with people from the [Johns Hopkins] School of Medicine, the results are immediate. I am a computer scientist and don’t know much about the medical part, but I love it when my tools are used and have an immediate application in the real world. I am excited to join the BME department and have the opportunity to recruit students in my area. Outside of BME, not many of the people who are working in genetic medicine have the computational skills needed for the type of work that I am doing. BME is more in line with my research, and a good place for me to grow.
What are you working on right now?
I am interested in finding the genes that are expressed in the cell. My lab is doing mostly computational work, developing computational tools for analyzing large DNA sequencing data. We ask questions about the exon-intron structure of genes and the level of expression between different conditions. Over the years, I have developed several tools that use machine learning algorithms to assemble and identify the genes that are expressed in cells.
What do you consider your biggest achievement so far?
I am really proud of StringTie, the transcriptome assembler that I developed. It has had a huge impact, with more than 1,000 citations so far, so lots of people are using it—more than 20,000 users in the last two years. It is such an efficient tool, and much more accurate than the state of the art at the time. When I wrote it, the existing transcriptome assembler would take 24 hours to process and assemble RNA-sequencing data, and StringTie was taking just 15 or 30 minutes, depending on the sample.
What impact would you like your work to have?
We still don’t know all of the genes in the human, which is the most studied organism. All of these genes have tremendous impact on biomedical research, and my work helps complete this picture. We recently put together a new gene catalog, called CHESS. Many of the techniques that scientists use interrogate known gene annotations to see if they have mutations or if they are causes for diseases, so it is important that we know where all of the exons are. We need to know that structure so that we know where to look; otherwise there are too many false positives. I am hoping to help in this process so that we have a really good annotation.
You mentioned CHESS, your database of human genes. What makes CHESS unique from other gene catalogs?
CHESS is more complete than other gene catalogs. We found that there are many more genes expressed than actually annotated. If you look at the known gene catalogs, you will see a lot of disagreement between them. We analyzed a huge amount of data, almost ten thousand RNA-seq samples, to see which of these genes showed real evidence of being expressed, because there is a large amount of transcriptional noise that needs to be cleaned. CHESS is basically making a union of the other catalogs and is the new prediction for the genes that we believe are actually real—not just noise—and are expressed in many samples at significant levels.
What’s next for your research?
In my lab, we will continue to develop computational tools and analyze large datasets as we have been doing. My emphasis will be more toward the functional aspect of things. Right now, we are characterizing exon-intron structures, but now we need to attach a function to them and ask what their role is in the cell. That is the next step.
Do you have any advice for engineering students?
I would tell them to explore. Unless you have known your exact passion in life since you were little, it is better to explore other areas, especially in BME where there is so much diversity. You don’t have to focus on a small area, you can integrate and have more impact if you are aware of other areas that may or may not be related to your work. It gives you a good perspective.