"Mahesh, I do agree that some of those things are indeed needed to operate the service, but the ones I've emphasized sound a bit off. Actually there are 3 things in the ToS that make me feel uncomfortable: 1. Granting these rights to "those Google works with" -- who exactly are they? 2. Grating the rights to "modify", "create derivative work", "publicly perform", "publicly display". If these are just operational requirements why not making it clear when these actions are necessary? 3. Grating all these rights forever (even if I stop using their services)."
- Alex Popescu
"Eric, It is very difficult to go through your post and point out each inaccurate bit, but I'd say that what you did is using very generic terms when naming the trade-off when actually the reality applies to very specialized cases or does not apply at all. From your post: "Data integrity—In order to achieve high performance despite massive size, non-relational database systems compromise data correctness guarantees. The traditional rules about writing data are loosened, making it far more likely that data can be lost or overwritten." Data integrity is a very generic term that covers many different aspects (e.g. referential integrity, type integrity, data correctness, etc.). Taken as a whole the above paragraph is correct only when referring to referential integrity. The lack of referential integrity checks in systems that encourage denormalization/offer different data modeling approaches should be pretty obvious to everyone. By doing this generalization exercise I think your post..."
- Alex Popescu
"Andy, Thanks for clarifications. I must confess that using tps in this context sounds a bit weird: in the first case it reflects the number of submitted operations (fire-and-forget) while in the latter it counts served requests."
- Alex Popescu
"Nikita, I always read the posts I'm linking to. And the quote above is what you call the fundamental flaw. But in my vocabulary that's the hypothesis or assumption under which Hadoop was architected. If that's a fundamental flaw then anyone could call GridGain's assumptions fundamental flaws too. Actually all software solutions will have more "fundamental flaws" than features as most of the time we design solutions that are customized to specific problems. I hope you'll agree that even if it's only about terms, the difference is significant."
- Alex Popescu
"Nikita, I hope I'm not having this misconception :-). "GridGain stores in memory only the data that is needed for the actual processing, while storing "remaining" data in any offline storage like SQL, ERP, HDFS, etc." Hadoop/HDFS: HDFS is a filesystem designed for storing very large files. Each analysis will involve a large proportion, if not all, of the dataset So two different assumptions, none being a fundamental flow. As regards In-Memory Data Grids/Elastic Caches and disk storage strategies, most of the time this means integration with solutions that read from/write to disk. But this doesn't imply optimizations for the access patterns required by the applications using the Data Grid/Elastic Cache."
- Alex Popescu
"That's an interesting approach and quite different to what I've been imagining. My approach would have to be built in the graph db or could be imagined as an extra access layer. Basically each information stored in the graph would be versioned and each access would be based on a specific version. Some examples: 1. give me nodes as per version X 2. traverse the graph at version Y 3. purge "entities" older than version Z The advantage of this approach is that historical data would be accessible in the same graph format. The obvious drawback is continuously growing amount of data to be stored (with a possible negative impact on the performance too)"
- Alex Popescu
"James, Let's try not to hide behind fingers :-). You talk to some journalists because you want the story told and read (by uninitiated at least) in a particular way: "Wow! Not only Couchbase has/sells a scalable NoSQL database, but they can also offer/sell consulting services for scaling. Wow!" As for me posting about it: 1) Someone *was* wrong on the Internet http://xkcd.com/386/ 2) A couple of my readers tipped me about the story and *this is exactly how I read it* 3) I hope you'll do a better job next time by: a) telling the story to the right people (yep, I include myself in that group) b) telling the real/complete story"
- Alex Popescu
"James, What I'm actually saying is that this is a bad story (or at least it was told badly). It reads as "OMGPOP uses X and they had to call X to scale" where X is in this case Couchbase but could easily be Oracle, or MySQL, or pretty much anything else. A good story would have said instead: 1) why Couchbase and most importantly 2) why they had to hire Couchbase to help scale."
- Alex Popescu
"As any distributed system, Datomic comes with its own trade-offs and it is nothing wrong with having a different opinion about some aspects. It definitely has nothing to do with the amount of code or its quality. Data locality is a well known principle in the space (just think for a second MapReduce). And you are most probably familiar with the "principle": There are only two hard problems in Computer Science: cache invalidation and naming things. :-)"
- Alex Popescu
"1. Riak is a key-value store (values are opaque to Riak). The only reason I haven't commented on that part is that I wanted to stay focused on the main topic: net splits. 2. MongoDB is a distributed database (replica sets, auto-sharding, etc.). But it is not part of this list."
- Alex Popescu
"So your hypothesis goes like this: "Those working on and using Hadoop are a bunch of idiots that do not realize its complexity. Its adoption is caused only by the fact that idiots reproduce faster than smart ones." I do not have any monetary interest in seeing Hadoop used or not. My only interest is in helping others form an educated picture of the market. And as a side note, I don't like the old marketing gimmick of throwing mud at competitors. I'll wait with interest to learn about your solution. Meanwhile there's so much happening in the Hadoop space that I need to pay attention too."
- Alex Popescu
""instantly increasing performance on the one hand and losing no data on the other" Unfortunately I don't think this can actually ever happen :-). But I could try to imagine some scenarios for collaborative loosely-coupled participants."
- Alex Popescu
"Ben, What you are saying is correct from the perspective of accessing data. But let's try to get on the same page :-). 1. What I wrote in the post and in the above comments refers to MapReduce implementations. And it's true and applies to MongoDB, CouchDB, Riak MapReduce implementations. 2. The fact that MongoDB offers two types of "queries" (native and MapReduce) while in CouchDB all "queries" are MapReduce is correct. But if we go to compare MongoDB queries with CouchDB views we will notice more differences than what you mention (and sound at a quick read as major benefits of MongoDB). Just to give you a quick example: in CouchDB views' results are cached and only new/updated data is re-processed. In MongoDB all queries are re-executed. Anyways, this is completely a different discussion."
- Alex Popescu
"Ben, I'm not really sure what you are referring to. But I'm sure my comments in the posts are related to the MapReduce implementation used by most of the NoSQL databases (serializing objects to a JavaScript engine that passes back the emitted key-values) :-). And as far as I know none of these implementations are actually happy with their performance. They actually worked very hard for finding workarounds: 1. managing pools of JavaScript engines 2. providing a set of native map/reduce functions 3. performing pre-filtering at the node level"
- Alex Popescu
"No problem. I've actually tried to get a copy of the paper and I'm pretty sure I have one saved somewhere, but I'd still not have the rights to distribute it."
- Alex Popescu
"Klint, a different way to put it is that for many NoSQL databases security wasn't (yet?) a priority. But that doesn't change much for those putting at risk their apps and data. And I could agree that this issue is not limited to NoSQL, but this is the space I'm focusing on. Just to be clear about a couple of things: 1. I'm no security expert, so I might be missing a lot of other security related issues. But sometimes I recognize patterns that should be avoided and try to warn against them. Awareness is the first step towards knowledge. 2. I know it's wishful thinking but I'd like to see every NoSQL database and NoSQL hosted solution having a big link to their documentation on security."
- Alex Popescu
"The title is that of the linked article. And my commentary was meant to clarify that all his complains are either about the Ruby SDK or coming from ignoring the DynamoDB documentation."
- Alex Popescu
"A while ago I've posted a bookmarklet that provides the same enhanced functionality. Actually it works for capturing snippets from websites and works with GMail (in both Safari and Chrome). Here is the post: http://jots.mypopescu.com/post..."
- Alex Popescu
"It is less about the tools and more about the approach. Predictive models are based on historical data while this approach uses a single point in time."
- Alex Popescu
"MongoDB lacks the knobs allowing to configure how the system uses the available memory and that can lead to unpredictable behavior (take a look at MySQL and all the various memory configuration options)."
- Alex Popescu