There has been quite a lot of literature in recent years about why classification schemes such as SCOP and CATH present a skewed view of protein space. While these classification schemes can be very useful, they do present a picture of "fold continents" and "superfamily countries" that are separate from one another, while protein space seems to be more continuous. So I'm not sure the number of SCOP categories vs. time is the best way to address whether "Protein Space is Still Ripe For Exploration".
- Mickey Kosloff
Mickey, thanks for your comment. Yes, I agree that their view is skewed. Do you have any suggestions about a better way to address the question? This way was certainly pretty quick and dirty. Also, I thought I had seen a presentation a couple of years ago stating that the number of new folds was leveling off (that didn't mention, as far as I recall, any issue or adjustment based on SCOP skew before coming to this conclusion). So I was a bit surprised by this myself.
- Ruchira S. Datta
I think the consensus has shifted rather recently and not everyone is on board yet. But I did hear, for example, that the developers of CATH now also agree that the definition of fold is limited, and does not convey well relationships in the context of protein space. The problem is that (as far as I know) there is no good way (yet?) to address this question. Any method that I'm familiar...
more...
- Mickey Kosloff
The typical approach in the machine learning / clustering community is to convert (possibly flawed) hierarchies into (possibly fuzzy) groups (or vice-versa). I am not sure what the utility/benefit of a fuzzy-clustering approach to protein classification would be apart from some probabilistic estimations on the number of folds/superfamilies/families.
- marcin
I was just curious about the dataset which was used to collect this statistics . I think in PDB there are lot of redundant protein structure i.e. same protein structure has been solved in different conditions or they are involved in different complex. if you remove those redundancies do you still find so much increase in the fold as your plot suggest ?
- sushant
suk211, I was using SCOP directly--some other commenters have been arguing with this choice. SCOP / ASTRAL is supposed to have taken care of the kind of redundancy you mention already.
- Ruchira S. Datta
What's the current state-of-the-art on the "folds being conserved through evolution" thinking? A lot of work we did about a decade ago was governed by the local three-dimensional structure and chemistry being conserved, even if the macro-structure changed.
- Deepak Singh
Deepak, I'm not 100% sure this is what you mean, but our group has two absolutely amazing methods, INTREPID http://bioinformatics.oxfordjournals.org/cgi... and Discern http://bioinformatics.oxfordjournals.org/cgi..., for active site prediction that use evolutionary information. INTREPID uses sequence-based information, whereas Discern uses INTREPID along with structural information. Is that the kind of thing you were asking about?
- Ruchira S. Datta
And, there's also my own current project...which I'd better get back to work on. :-)
- Ruchira S. Datta
Ruchira, yep. I am a little out of touch with the field for the last couple of years. Discern seems to be along the lines of work we were doing some years ago in that space (stuff like this http://www.ncbi.nlm.nih.gov/pubmed...), which was an extension of the Fetrow and Skolnick work referred to in the paper
- Deepak Singh
from IM
Deepak , some time back in PNAS I saw a commentary by David Shorttle titled "one sequence plus one mutation equals two fold" and after reading that I thought applying the sequence similarity criterion to determine such fold conservation might not be a good idea but I am really not sure about it. I guess that's what you meant by saying "folds being conserved through evolution"
- sushant
@Deepak, what did you mean? Were you talking about the idea that folds are conserved (and are discreet from one another) in different organisms, or that substructures such as active sites are conserved even between different folds?
- Mickey Kosloff
Continuing sushant's line, an additional question that arises is how "continuous" is fold space, that is, is there a series of intermediate (stable) folds between each two folds? Works from Ron Elber's group (http://www.ncbi.nlm.nih.gov/pubmed... and more recently http://www.ncbi.nlm.nih.gov/pubmed...) show that certain folds are 'sinks' and slight sequence variations can...
more...
- Nir London
"Whereas the first human genome cost $3 billion and took more than a decade to produce, Illumina charges $48,000 for the kind of sequencing Close got."
- Mike Chelen
from Bookmarklet
Our paper on Chemistry in Second Life is marked as "Highly Accessed" in Chemistry Central Journal - that's nice to discover http://www.journal.chemistryce...
Feedback request. I try to make a visual overview of my experience so far and my future plans. Any comments regarding both, visual side and scientific/career side really appreciated.
- Pawel Szczesny
One quick suggestion: increase the font size and color differences. It might emphasize your primary interests a bit more.
- tim
from Alert Thingy
Thanks Timothy. That's a good suggestion. I work on another version that would incorporate some more details on the past projects, but so far it's even less readable than this one.
- Pawel Szczesny
I really like the concept of this. I take it the colors correspond to particular positions you held? Have you tried using a color gradient for the categories (photography, science, programming, ...) and different fonts to distinguish each job/educational position?
- Chris Lasher
Thanks Chris, that's a great suggestion! Having a "now" line makes current color code a little redundant.
- Pawel Szczesny
Good idea. How would you show projects that are still on? Maybe you can change the text direction?
- Marcin
Most people with color deficient vision (like mine) will not be able to tell your red from your green (I can't). Also it seems a little odd that there is only one thing in the photography category (a hobby?) and so the columns do not really line up. Maybe a third color for non-work-related? And seconding Chris' idea to use fonts to distinguish where you learned/worked on each item in the list.
- Bill Hooker
@Bill Did you notice Pawel plots the words according to two axes: skillset along the X-axis, and time of gaining that experience along the Y-axis? I missed this at first and was wondering about the Photo category, too, but then I realized how "scientific animations" and "molecular visualizations" spanned both Science and Photography, and hence, they're in between them on the X-axis. That's neat, because you can see that Pawel's interest in visual things started off in night photography...
- Chris Lasher
...and when combined with his exposure to science led him to do, say, visualization of biological networks. Really cool concept. Also, good call about reds and greens.
- Chris Lasher
Marcin, things still going are around the "now" line (dashed line in the middle). I don't think exact start/end does matter, it's more about how things circulate in time. Bill, Chris explained already what is the idea behind three "columns" and placing things. In the first approximation phrases were connected (and it was easier to get the idea about various inspirations), but I removed them for the clarity. Colors will be changed.
- Pawel Szczesny
as long as it is google accessible...and I think several of these people come from unis without repositories that are good on data but nonetheless, this ought to be the first choice...
- Cameron Neylon
How many are set up to cope with data? I should know this - but I've always tended to go off site or have our own services. It always seemed difficult for the Cambridge people to get stuff in and out.
- Cameron Neylon
D, yes that's helpful. Presumably anything with specialised metadata is also unlikely to be exposed? I think the problem is that most of these are collections of files, sometimes with some sort of descriptive file that a specialist could make sense of. Google Data Sets would have been perfect for this....[sigh]
- Cameron Neylon
from twhirl
maybe archive.org? also think direct p2p release will become an increasingly viable option
- Mike Chelen
So key message is talk to your local IR people... ;-)
- Cameron Neylon
OAI-ORE is built for this kind of thing of course but I don't know if there are any tools for packaging arbitrary things up that way.
- Cameron Neylon
You need an institutional affiliation to use it, but Dataverse (http://thedata.org/) is pretty good -- easy to set up, web or local, lots of options. For instance, each lab could have its own dataverse for the sorts of "odds and sods" that Clare mentions.
- Bill Hooker
I think that a torrent tracker for scientific datasets would be sweet
- Marcos de Carvalho
torrent is kind of cool/amusing idea but if no one is seeding the torrent, it is effectively gone
- Richard Akerman
from BuddyFeed
mininova provides free torrent hosting & seeding through their content distribution program http://www.mininova.org/distrib... however researchers may feel opposed to the use of consumer grade solutions
- Mike Chelen
from IM
Web seeding from one of the providers cited above could be an alternative, in addition to the swarm. This could help relieve the bandwidth from a unique source and also acts as a backup in case everybody stop seeding a specific torrent or the default provider goes out.
- Marcos de Carvalho
Although Mininova is a cool service I agree that people would be a little afraid to use it. However, searching SourceForge I found lots of opensource torrent trackers, LAMP based, ready to install.
- Marcos de Carvalho
nice thing about mininova is that they provide the file hosting, and maintain at least 1 seed indefinitely. a web seed can accomplish something similar now too, given how inexpensive hosting plans have become. here's an example of scientific data (fasta dna from ensembl project) being distributed: http://www.mininova.org/tor...
- Mike Chelen
from IM
another promising p2p filesharing system is made by http://wuala.com using a modified bittorrent protocol, and features filesystem integration allowing the distributed data to appear as a standard network drive. however, it isn't backwards compatible with bittorrent
- Mike Chelen
from IM
Data torrents was something that came up in a discussion about how to preserve DNA sequencing data - seems like there is a potential there - also perhaps some built in measure of peer review? Clare, I suspect if you have a proactive and friendly IR manager they will be happy to take data from multiple institutions. The main thing is to check with those other institutions (and the people you worked with) to make sure they are happy with it. People often get very strange about posting "their" data.
- Cameron Neylon
There are some issues about data copyright/re-use. I use satellite imagery and the re-use constraints can be a nightmare. Whenever I purchase it I try to get the licencee to be as big as possible (at least university level but possibly wider (anyone have experience of this?)). So open sharing is not completely possible. One way I have looked at this is to use an institutional repository...
more...
- Ant Beck
@Cameron, I think that some sort of peer review could be implemented through a comment/rating system coupled with a user trust certification (like the one used by Advogato) as well as by the number of seeders, in the case of data torrents. Taken together, these metrics may help identify datasets that can be trusted.
- Marcos de Carvalho
Ant, this is a big problem. Which is why some of us are arguing for research data to be made explicitly public domain wherever possible. I think you're doing the right thing by trying to make it as widely useable as possible. Do you have permission to re-publish to support derivative results that you are putting into journals? Some interesting questions raised by this, e.g. can people actually trust your claims if they can't see the original data?
- Cameron Neylon
Marcos, that's true, I was thinking more the peer review of archiving that is implict in whether people are seeding particular data sets. Not sure this is a good way of determining what gets kept but nonetheless it is a possible mechanism...
- Cameron Neylon
SQL databases can be safely imported and exported as a series of text statements. Also, other systems such as SQLite exist natively as files.
- Mike Chelen
Cameron: One advantage of BitTorrent for peer review is the unique identification of data sets through integrated CRC checking. Another is resilience, because as long as at least 1 seed exists the files can always be duplicated and distributed by anyone.
- Mike Chelen
D0r0th34: Yup indeed, at the file and block levels. Most clients are designed to see such mismatches as data corruption, which is corrected by re-downloading the appropriate sections, however the checksums and protocol could have an interface designed to compare different versions.
- Mike Chelen
Cameron: AFAIK all providers allow publication of derivatives (web and trad. print) but with onerous provider specific constraints. However, the common format for derivatives is jpg. This degrades the data structure in the imagery and means that re-users can not conduct reconstructive image analysis. Discussed this in a paper here: http://www.univie.ac.at/aarg...
- Ant Beck
mammalian genomes have been losing mobile DNA elements since the kt extinction. Connection to extinction somewhat tenuous but interesting.
- Iddo Friedberg
"...Gert Vriend at Radboud University Medical Centre in Nijmegen, the Netherlands, and his colleagues are writing software that they hope will eventually automatically re-refine, at the click of a mouse, all the data deposited in the PDB."
- Wladimir Labeikovsky
Imagine how much more validation and cleanup they could do if everyone also deposited raw data (eg raw diffraction images, or unprocessed NMR spectra) ! This is why fledgling initiatives like TARDIS are important ( http://www.tardis.edu.au/ ). Personally, I would love to run my structures through their pipeline during analysis and at submission - this way everyone's data is polished,...
more...
- Andrew Perry
@Neil Heh, ego aside, Gert's brand of diplomacy is a bit... rough-edged (EDIT and his sense of humour too)
- Andrew Clegg
:-) A lot of the work I did in grad school was done using WHATIF. Wonder how much he's added on to it in the last decade
- Deepak Singh
"University College London (UCL) has become the latest institution to adopt an open-access publishing policy, adding to a rapid increase in such mandates over the past year."
- Björn Brembs
from Bookmarklet
Too much to think about to respond sensibly at moment but just one point is that there are two issues, one is the technical issue - can it be managed legitimately - and the other is the social confusion issue - how many people do you lose because they assume that you're not allowed? I would certainly make the assumption that I can't mix CC-BY, GFDP, and oDBL together so would walk away....
more...
- Cameron Neylon
The thing is, we have pretty much all the technology to not have to mix the data. That is really the point I want to make. Look at what Bio2RDF does... they have a common (SPARQL) interface, but the data in different databases. Hence, they have no need to mix the data, but only link the data. And linking the data can go even via a clean, independent interface... InChI, for small molecules, rdf.openmolecules.net for InChI's as URLs...
- Egon Willighagen
ah man I need to think about this properly. Are you going to be online for the OKF working group meeting this afternoon?
- Cameron Neylon
from twhirl
Required reading for anyone interested in Open Data. I would still say PDDL/CC0 makes things simpler by removing all question, but I *think* you may be right about clean interfaces making PDDL/CC0 not strictly necessary.
- Bill Hooker
comment = {Found via a search in PLoS Biology for "kinesin." I only read the abstract but learned that Tetrahymena has two nuclei--one for sexual reproduction (MIC) and one for gene expression (MAC). This is fascinating to me, and hopefully someday I can learn about it. At this point, I have no idea what benefits this could have for the organism. I also wonder how it impacts DNA damage repair and genome stability. } PLoS Biol, Vol. 4, No. 9. (29 August 2006), e286. The macronuclear genome of Tetrahymena thermophila is sequenced and analyzed. Conservation in this single-celled ciliate of some features normally observed in only multicellular organisms sheds light on early eukaryotic evolution. Jonathan Eisen, Robert Coyne, Martin Wu, Dongying Wu, Mathangi Thiagarajan, Jennifer Wortman, Jonathan Badger, Qinghu Ren, Paolo Amedeo, Kristie Jones, Luke Tallon, Arthur Delcher, Steven Salzberg, Joana Silva, Brian Haas, William Majoros, Maryam Farzad, Jane Carlton, Roger Smith, Jyoti Garg,...
- Steve Koch
@Steve. Just be grateful I didn't actually link the gluteal crease.
- Iddo Friedberg
Shirley, sometimes you worry me... :-)
- Bill Hooker
@Bill, don't worry, I'm merely appreciative of a fine turn of phrase ;). Thankfully, long shirts that cover the bum are in these days.
- Shirley Wu
Did the subjects use gluteal crease floss first? Sisqo recommends it to keep away the microbiome creeps.
- Steve Koch
@Neil: regarding the exisence of core miciobiome: see http://bytesizebio.net/index... Jeffrey Gordon from WashU claims that there isn't much of a core micriobiome, at least for the gut.
- Iddo Friedberg
"They bought minicows -- compact cattle with stocky bodies, smaller frames and relatively tiny appetites. Their miniature Herefords consume about half that of a full-sized cow yet produce 50% to 75% of the rib-eyes and fillets, according to researchers and budget-conscious farmers. "We get more sirloin and less soup bone," Ali said. "People used to look at them and laugh. Now, they want to own them." In the last few years, ranchers across the country have been snapping up mini Hereford and Angus calves that fit in a person's lap. Farmers who raise mini Jerseys brag how each animal provides 2 to 3 gallons of milk a day ... one animal needed less than an acre for grazing. Because the minicows could be grass fed, the couple were spending at least half the amount on feed than they would have on regular-sized animals. The minicows also reached their mature weight faster, so they could be sold for meat sooner."
- Paul Buchheit
from Bookmarklet
I kind of want to own a mini-farm now.
- Paul Buchheit
They're cows the size of schnauzers but they're cattle!
- Akiva
I could swear I saw this on a Jack in the Box commercial....
- Alan Chamberlain
Paul, you've never struck me as a farmer, but if you get a mini-farm, I definitely want to see it.
- Clare Dibble
Yeah, I think the "taking care of it every day" part might be a problem for me. I'll probably have to hire a mini-rancher. I've been thinking about finally getting some ducks too. I think our yard is big enough that the mess won't be too bad, and we'll have a lot of fresh eggs.
- Paul Buchheit
I'm thinking the proportion of meat to bone would be identical, "less soup bone". LOL.
- Kevin Gamble
Reaching their mature weight faster suggests to me that they develop even less flavour than regular cows. OTOH, being half grass-fed can't hurt.
- Andrew C (✓)
Andrew: calves are immature, yet people seem to really like veal for some reason.
- Gabe
Fair enough, but veal is deliberately raised to be mild and tender and unlike beef. (OTOH, speaking of flavourless beef, people do seem to like filet mignon.)
- Andrew C (✓)
" They also had a tough time finding collars for ID tags small enough to stay put on their calves. So the owners of the Sonoma Little Cattle Co. in Santa Rosa, Calif., went to a pet store and bought dog collars. "It wasn't until later that we realized they had tiny hamburger and hot dog designs on them," Mintun said."
- Clare Dibble
I wonder how much tri-tip one of these makes relative to a normal size cow.
- Clare Dibble
I think this is the sort of cow that White Castle uses for their little burgers.
- Gabe
Well done. Buy one on me (seriously -- will pay you back when I next come by SF)
- Jonathan Eisen
Congratulations! Until the concept of journal prestige dies out completely (if it ever does), that's the best bio journal in the world.
- Bill Hooker
@Jonathan, what, you buying beers for all authors in PLoS Biology? _Definitely_ need to put that in the marketing materials... (I bet the I.F. would go through the roof -- sorry Bill...)
- Andrew Su
Well, the plan was that this was specific for Pedro (and what is I think his first PLoS Bio paper but he can confirm ...). But sure, if there are other first time PLoS Bio authors I am more than happy to buy a beer for them. However, after Pedro there will have to be some restrictions - like I have to be there ....
- Jonathan Eisen
"Ever wanted to see the entire conversation surrounding a post? Now you can! This simple bookmarklet will load comments from Twitter, FriendFeed, Digg, Reddit, HackerNews and any blog mentioning the article and will load it in a handy sidebar"
- Andrew Perry
The example says: "could not retrieve results".
- Björn Brembs
Must be a temporary outage .. it worked for me ~ 24 hour ago.
- Andrew Perry
Our mission is simple: To enable the public to fund pilot research projects. Accomplishing this goal has immense benefits. First we're providing research funds to a whole new generation of researchers that are our future. Secondly we're walking the public through the scientific process, from grant writing to funding, all the way to the results. Finally we are creating an ecosystem for scientists to collaborate with each other as well as the public on shaping future research projects. Tags: science.funding Posted by: cwhooker
- Bill Hooker
This + sciflies = good links, thanks Bill. I was thinking about this very idea recently - having a PhD student conduct their studies in the open, with funding from the public. There are a lot of issues with such a model. "The public" encompasses such a wide range of background scientific knowledge. And science is expensive - one PhD is $100K+. Maybe a good model would be to use public...
more...
- Matthew Todd
"Brick by brick, Lego has been building its way out of the near bankruptcy it suffered around the turn of the century. It has done this by a seemingly simple strategy — making awesome product after awesome product. Now it is releasing the almost ridiculously fitting Architecture series, beginning with the Frank Lloyd Wright Collection, six planned sets including the Guggenheim in New York and Fallingwater, the iconic cantilevered waterfall-house outside Pittsburgh, Pennsylvania."
- Shirley Wu
from Bookmarklet
Twitter account works, the website not yet ;)
- Pawel Szczesny
Hopefully, I'll be free to attend again! Diary is clear at the moment...
- David Bradley
Cameron - we would have preferred a later September date as well, but August 22 was the only available date at the Royal Institution (and even that hasn't been officially confirmed yet). Announcement will come later this week!
- Victor / Mendeley Team
You know, when I moved from New Orleans to San Diego, I thought I would be less jealous of people living near a great community of science bloggers, but you Londoners really have us all beat!
- Mr. Gunn
Victor, I'll cope, just have to cope with some more muttered comments about someone being a "blog widow" :-)
- Cameron Neylon
Gunn: that is why online presence of conferences is vital, so interested parties in entirely different regions can participate
- Mike Chelen
mm, I wonder how I can spin it that, attending this will increase sales, definately want to go
- Frank
any funding for out of country presenters?
- Kevin Z
from twhirl
Website is due to go live Tuesday, after the bank holiday. There's barely enough funding for the conference itself -- there will be a small registration fee, and they didn't pay me anything last year!
- Richard P Grant