"Indeed, my post makes it sound like _all_ the relationship (or at least all the critical ones) are part of your datasets. I can see scenarios where data coming from outside--should we call these (meta)data augmentation?--is as important as your data. But I still believe that the core value needs to be centered around your data as otherwise everyone would get the same insight."
- Alex Popescu
"Is this idea similar to G+ circles? From a followers perspective this seems like a good idea. But I'm wondering how many would actually decide for each twit what channel it belongs to after creating a dozen of them."
- Alex Popescu
"Michael, I already linked to that Reactions to MySQL 5.6: Couchbase and DataStax's Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark. This is the "rest" of the reactions."
- Alex Popescu
"We're improving the client protocol now and we'll also look into improving and simplifying the API. It would be great to learn from you what difficulties you've seen so we can do something about them. alex @ rethinkdb"
- Alex Popescu
"ddorian, thanks for your support. Please keep it polite and no personal attacks on this blog. Could you please update your comment? Many thanks"
- Alex Popescu
"You'll probably be disappointed to read that: "Beginning with version 2.2, MongoDB implements locks on a per-database basis for most read and write operations. Some global operations, typically short lived operations involving multiple databases, still require a global “instance” wide lock. Before 2.2, there is only one “global” lock per mongod instance." That's from the official documentation:http://docs.mongodb.org/manual..."
- Alex Popescu
"Thanks Matt. Unfortunately having an absolute number and a growth number, doesn't help at all to formulate any hypothesis about conversion rates, users that actually use DaaS in production vs experimentation."
- Alex Popescu
"Tools are one thing and the Hadoop ecosystem is not lacking them. What is lacking is simplicity or friendliness or at least ways to avoid complexity."
- Alex Popescu
"1. Logs are already stored locally so saving them in yet another place just in case logstash is failing doesn't seem like a very good reason. 2. If storing them in a centralized location is to scale this solution, I'd say this is just postponing solving it effectively. Basically if producers generate more events than the consumer can consume adding a mid-layer would not solve the actual problem. By the way I'm not criticizing the solution per se, but trying to understand it and sharing my thoughts on what feels to me to be its weaknesses"
- Alex Popescu
"> Not sure I understand what you mean. You don't consider 2) to be a rolling upgrade? I don't think Ian's blog post gives enough insight to see the clear winner. I'm actually saying that the standard name for 2) is a **rolling upgrade** (the author calls it "migration instance by instance")."
- Alex Popescu
"Merging would be the ideal situation, but I'm not suggesting it's the expected behavior. The expected behavior would be to signal a conflict: **table already exists and has not been replicated from this master**."
- Alex Popescu
"Matt, I've looked very carefully over the screenshot and the only thing I've seen was InfoChimps. Derivative work? I'd be interested to read your report if you could send me a copy. Thanks."
- Alex Popescu
"I've linked to this post from myNoSQL blog and left my comment there: http://nosql.mypopescu.com/pos... (tl;dr: afaik Redis supports multiple "databases" per instance)"
- Alex Popescu
"Mahesh, I do agree that some of those things are indeed needed to operate the service, but the ones I've emphasized sound a bit off. Actually there are 3 things in the ToS that make me feel uncomfortable: 1. Granting these rights to "those Google works with" -- who exactly are they? 2. Grating the rights to "modify", "create derivative work", "publicly perform", "publicly display". If these are just operational requirements why not making it clear when these actions are necessary? 3. Grating all these rights forever (even if I stop using their services)."
- Alex Popescu
"Eric, It is very difficult to go through your post and point out each inaccurate bit, but I'd say that what you did is using very generic terms when naming the trade-off when actually the reality applies to very specialized cases or does not apply at all. From your post: "Data integrity—In order to achieve high performance despite massive size, non-relational database systems compromise data correctness guarantees. The traditional rules about writing data are loosened, making it far more likely that data can be lost or overwritten." Data integrity is a very generic term that covers many different aspects (e.g. referential integrity, type integrity, data correctness, etc.). Taken as a whole the above paragraph is correct only when referring to referential integrity. The lack of referential integrity checks in systems that encourage denormalization/offer different data modeling approaches should be pretty obvious to everyone. By doing this generalization exercise I think your post..."
- Alex Popescu
"Andy, Thanks for clarifications. I must confess that using tps in this context sounds a bit weird: in the first case it reflects the number of submitted operations (fire-and-forget) while in the latter it counts served requests."
- Alex Popescu
"Nikita, I always read the posts I'm linking to. And the quote above is what you call the fundamental flaw. But in my vocabulary that's the hypothesis or assumption under which Hadoop was architected. If that's a fundamental flaw then anyone could call GridGain's assumptions fundamental flaws too. Actually all software solutions will have more "fundamental flaws" than features as most of the time we design solutions that are customized to specific problems. I hope you'll agree that even if it's only about terms, the difference is significant."
- Alex Popescu
"Nikita, I hope I'm not having this misconception :-). "GridGain stores in memory only the data that is needed for the actual processing, while storing "remaining" data in any offline storage like SQL, ERP, HDFS, etc." Hadoop/HDFS: HDFS is a filesystem designed for storing very large files. Each analysis will involve a large proportion, if not all, of the dataset So two different assumptions, none being a fundamental flow. As regards In-Memory Data Grids/Elastic Caches and disk storage strategies, most of the time this means integration with solutions that read from/write to disk. But this doesn't imply optimizations for the access patterns required by the applications using the Data Grid/Elastic Cache."
- Alex Popescu
"That's an interesting approach and quite different to what I've been imagining. My approach would have to be built in the graph db or could be imagined as an extra access layer. Basically each information stored in the graph would be versioned and each access would be based on a specific version. Some examples: 1. give me nodes as per version X 2. traverse the graph at version Y 3. purge "entities" older than version Z The advantage of this approach is that historical data would be accessible in the same graph format. The obvious drawback is continuously growing amount of data to be stored (with a possible negative impact on the performance too)"
- Alex Popescu
"James, Let's try not to hide behind fingers :-). You talk to some journalists because you want the story told and read (by uninitiated at least) in a particular way: "Wow! Not only Couchbase has/sells a scalable NoSQL database, but they can also offer/sell consulting services for scaling. Wow!" As for me posting about it: 1) Someone *was* wrong on the Internet http://xkcd.com/386/ 2) A couple of my readers tipped me about the story and *this is exactly how I read it* 3) I hope you'll do a better job next time by: a) telling the story to the right people (yep, I include myself in that group) b) telling the real/complete story"
- Alex Popescu
"James, What I'm actually saying is that this is a bad story (or at least it was told badly). It reads as "OMGPOP uses X and they had to call X to scale" where X is in this case Couchbase but could easily be Oracle, or MySQL, or pretty much anything else. A good story would have said instead: 1) why Couchbase and most importantly 2) why they had to hire Couchbase to help scale."
- Alex Popescu
"As any distributed system, Datomic comes with its own trade-offs and it is nothing wrong with having a different opinion about some aspects. It definitely has nothing to do with the amount of code or its quality. Data locality is a well known principle in the space (just think for a second MapReduce). And you are most probably familiar with the "principle": There are only two hard problems in Computer Science: cache invalidation and naming things. :-)"
- Alex Popescu