Plant Ontology Database #0409 Release

Posted: April 20th, 2009 | Author: shuly | Filed under: Uncategorized | No Comments »

Plant Ontology Consortium (POC) is excited to announce the release #0409 (April 2009) of the Plant Ontology Database.

In this release, we bring you 26625 gene annotations from TAIR, Gramene, SGN and MaizeGDB, 8558 QTL annotations from Gramene, 9832 germplasm associations from SGN, MaizeGDB and NASC.

The following number of annotations were added for the first time: 16016 gene annotations from TAIR, 3 gene annotations from Gramene, 2 gene annotations from SGN and 3928 germplasm annotations from MaizeGDB.

For genes curated by TAIR and SGN, you may also find links to their Gene Ontology (GO) pages through PO browser.

The new ontology files and the database dump are available for download.

To submit plant ontology term requests, we encourage researchers to use SourceForge PO tracker.

The Plant Ontology Consortium
web: http:www.plantontology.org
e-mail: po-dev at plantontology.org

The project is funded by National Science Foundation, USA, (Grant No. DBI-0703908)


Gramene goings on, etc.

Posted: April 9th, 2009 | Author: Ken Youens-Clark | Filed under: Uncategorized | 1 Comment »

In February Gramene released our 29th build. I won’t go into details here because you can read our release notes. Shortly afterwards I got a chance to present a poster on said release at CSHL’s “Plant Genomes” meeting. It was nice to meet some of our users and listen to the many interesting talks.

Since then, I’ve been doing lots of things. In no particular order: Read the rest of this entry »


I have a suggestion for you

Posted: January 21st, 2009 | Author: Jim Thomason | Filed under: Uncategorized | No Comments »

I’m a little bit behind the times. I was supposed to post something up here last week, but was so wrapped up in actually working on what I was going to write about that I completely forgot.

Gramene is going to be rolling out an autocomplete feature to offer suggestions to users sometime in the near future. You can sample the wonderfully suggestive goodness here until we go live.

Read the rest of this entry »


All Gramene, all the time

Posted: December 19th, 2008 | Author: Ken Youens-Clark | Filed under: Uncategorized | No Comments »

As the Gramene project manager, pretty much everything I do is directly related to that. Read the rest of this entry »


The learning curve

Posted: December 18th, 2008 | Author: Andrea Eveland | Filed under: Uncategorized | 2 Comments »

It is interesting to look at where in my current state as a programming newbie I fall on this curve.  My first experience with Perl (or any programming language for that matter) was during the CSHL Programming for Biology course in mid October.  I came away with a very large three-ring binder and an array of books with different animals on them…essentially the tools needed to tackle any data analysis situation that could be simplified or made manageable using perl.  I am very fortunate to have since been working alongside a group of very helpful friends in the Ware lab.  Reaching my spot this far along the learning curve would have been very difficult without them.  Even so, in these last 2 months I have experienced a series of peaks and valleys corresponding to momentary jumps of joy and periods of frustration where I feel seemingly unproductive.  As I move along the learning curve, although I continue to experience valleys, they are becoming increasingly more complex and my intermittent peaks are actually beginning to produce useful information for my research.  For example, earlier in the week I spent an entire evening trying to figure out whether the data structure that I had constructed in my code was an array of hashes or a hash of hashes or a hash of arrays, etc.  After systematically trying to isolate elements of the code line-by-line and commenting on what each gave back, I felt I had made some progress and went to bed.  As usual, I curled up with my camel book and a glass of wine.  What was not usual was my erratic sleep and the visions of arrays and hashes looping through my head.  I am not even kidding a little bit…I must have woken up about 10 times, each after hitting an error message.  The next day I felt tired and stressed, but when I sat back down at the computer, I realized almost immediately that I had a hash of arrays in which one element was a hash reference!  Ok, a little weird and probably not uncommon among programmers since I think if you stare at anything long enough it tends to come back to haunt you in your sleep.  Perhaps complete immersion is a little unhealthy since I dreamt of hashes again last night.  But I am happy about my progress along the learning curve.  

So in a nutshell, aside from the technical aspects of things, what have I learned thus far?  First of all, programming is really fun!  As a biologist working with deep sequencing data, it is also really essential…at least a basic knowledge anyway.  Even knowing only what I know now would have made my research as a graduate student that much easier.  There are only so many lines you can populate in an excel spreadsheet before the computer crashes.  Especially with some of these Solexa datasets…such analyses would be virtually impossible without the proper codes.  Also, very importantly, I learned that programmers love Starbucks.  Nuff said…I’m in good company :)


Massively Parallel Sequencing Data Storage Requirements

Posted: December 5th, 2008 | Author: Jer-Ming Chia | Filed under: Uncategorized | No Comments »

Peter recently asked for estimations of our disk space requirements in the next couple of years. I came across this table (Next-Generation Sequencing Informatics Statistics) and thought it would be useful.

Don’t you find the phrase “Next-Gen Sequencing” so …… “Web 2.0″ ? I prefer “Massively, Embarrassingly, Shamelessly Parallel Sequencing” but it is rather clunkly. Suggestions?


Move to the new Lab

Posted: December 2nd, 2008 | Author: Lifang Zhang | Filed under: Uncategorized | No Comments »

We have been officially kicked out from Room 110 in Dlebruck building. From November, we are slowly moved bit by bit to new room 101, who used be occupied by Jacek lab. Read the rest of this entry »


These last couple weeks I…

Posted: December 2nd, 2008 | Author: shuly | Filed under: Uncategorized | 2 Comments »

…became an aunt to two beautiful twins, a girl and a boy, of my brother and his beautiful wife! Then on the next day I hear that my other brother’s girlfriend is pregnant too, and we’re all very excited – so I guess I can finally, somehow, in a way… be part of the “gramene babies” family…

Now back to business… These last couple weeks I’ve been working on several things. I’ll briefly discuss a few.

So, right after we’ve came out with the Plant Ontology data release on November 12 (yea!!!), I started setting up the Plant Ontology wiki. A decision was made, to convert all the documentation pages that are currently hosted on the plant ontology website in html format, onto the wiki, in order to simplify the task of editing documents and updating the website’s repository. To convert the html to wiki format I used the html2wiki converter, which helped quite a bit with the task, but required a few manual fixes to the pages.

The PO wiki access has been set up as read-only for everyone, and may be edited by registered users only. However, we also need an internal section that only registered users may read. Well, this is a bit tricky, since MediaWiki is not supporting per-page access restriction. According to MediaWiki documentations, there are two basic possibilities:

1. Set up separate wikis with a shared user database, configure one as viewable and one as unviewable, and make interwiki links between them.
2. Install a third-party hack or extension. You will have to reapply it every time you upgrade the software, and it may not be updated immediately when new security fixes or upgrades of MediaWiki are released. Almost all hacks or patches promising to add them will likely have flaws somewhere, which could lead to exposure of confidential data

A list of various extensions that restrict user access to specific pages or namespaces, and problems they may exhibit or, on the contrary, deal with, is found here.

In order to test the extensions for security problems, one may consult this page

Going over the above lists, I have chosen to test the Extension:Lockdown, which should allow us to use a custom namespace for our internal usage.

Using this extension, I’ve created a custom namespace, which only registered users may access. So far it seem to be working well. However, pages in this namespace do appear in search results, as well as on the “recent changes” page, yet the whole page is not accessible to anonymous users, and requires login to view. I intend to make some further testing to make sure no sensitive data is exposed, yet, I feel that we should eventually use two separate wikis with interwiki links between them (despite the fact that Pankaj is reluctant to maintain two wikis).

Another task I accomplished was to copy MySQL databases (all ensembl and markers dbs) from our live database server onto a new server, ‘filetta’, designated for web services, to be used by external users. These databases were created as compressed, read-only, using the ‘myisampack’ utility. This resulted in significant savings in space, and hopefully, better performance (that we’re still testing). To make my compression task simpler, and not to forget any step along the way (such as: locking the tables before packing, running myisamchk to check the tables for errors, rebuilding the indexes after packing and then flushing the tables) I wrote a simple shell script, which I can provide upon request (I intended to post it here, but encountered serious indentation problems).

Just to mention another MySQL database related work, is the “house cleaning” of “cabot”, our development database server, and changing the backup strategy from backing up all databases on a daily basis, which takes a huge amount of space and a long processing time (more than 12 hours), to selectively backing up some databases daily, and others on a weekly basis, as needed. Next thing would be to keep up with the performance tuning work as I discussed in my previous blog enrty.

One other thing I’ve been working on is semantic web services for the Plant Ontology using the SSWAP infrastructure. I’ve started off with composing a simple OWL-DL ontology, using the ontology editor protégé, to describe PO annotations. This first draft of the ontology is based on the PO database structure and the type of data we are aiming to provide, and is following the guidelines of similar ontologies hosted on the SSWAP ontologies page . The generated poAnnotation.owl ontology file may be best viewed by opening it with protege.

Well, that’s it for now…


Liya’s trip to CSHL

Posted: November 14th, 2008 | Author: Liya | Filed under: Uncategorized | No Comments »

I visited the lab last Friday (November 7th). Since it was Friday, I didn’t meet many colleagues. The lab office was in remodeling. I’ll go to the lab next week (November 19 possibly) and some telecommuters will be there too !
Doreen had spent the whole day with me. We discussed my objectives for the next several months. First we discussed the protein annotation and xrefs. Will joined us on the phone. We set up a document on protein annoation and xref pipeline in order to have consistency when running the pipeline by different groups. We looked into the reason of inconsistent GO annotation results on rice, maize and sorghum. One reason is the inconsistent annotation. Rice has better annotation than maize and sorghum. It has more Uniprot protein annotations. Also it is possible the annotation pipeline uses different version of Interpro database and xref source databases. Another reason is biological, e.g the corresponding region becomes a partial gene or an intron.
We also discussed some compara analysis with Josh by phone. Josh wrote a very informative word document on the research objectives. What I need do is finding the orthologue dataset first. The summer interns have a perl script to get the orthologues from compara database by using ensembl API. I have used this script to get the orthologues. I’d like to update the script so the script is more generic. I also like to take a look of the mart schema to see the underline structure. The next item to do is looking for the synteny region beteen rice and sorghum. Jack Chen’s lab has developed OrthoCluster (Ismael Vergara) to get the synteny region. Josh also points out the DiagHunter and SyMap.
The third item on the do list is about microRNA targets and pathway analysis and enrichment for GO categories. Read Chris’s paper of ‘Identifying microRNAs in plant genomes.’. Get the protein targets for the microRNAs from Lifang and identify possible pathways and GO categories.
The trip is very objective-oriented. I’ll visit the lab more often from now on.


Moving to Subversion

Posted: October 31st, 2008 | Author: Shiran Pasternak | Filed under: Uncategorized | No Comments »

Recently, we decided to migrate our entire codebase from CVS to Subversion (SVN). Ken did most of the groundwork. I am really excited about our new version control setup, and have been dreaming about this for a long time. But to butcher a biblical metaphor, I only played Aaron to Ken’s Moses.

Read the rest of this entry »