"Ashish Thusoo presents the data scalability issues at Facebook and the data architecture evolution from EDW to Hadoop to Puma."
- imabonehead
from Bookmarklet
"Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through."
- imabonehead
from Bookmarklet
CS 442 Distributed Systems & Algorithms dersi kirk yilda bir acilmis, buyuk ihtimalle alacagim. Hocasi ne kadar hardcore wireless'ci olsa da napalim memlekette adam yok verecek.
"This is first of the three blog posts talking about how we build scalable, feature-rich, front-end apps for LinkedIn with an emphasis on site speed, performance, and developer productivity. Today, we'll tell you about our move from server-side templates, such as JSPs, ERBs, and GSPs, to client-side JavaScript templates powered by dust.js."
- imabonehead
from Bookmarklet
"It turns out that there are many client-side templating solutions to choose from. Some are based on Resig's microtemplating (e.g. underscore.js), some are logic-less (e.g. mustache), some use a Haml syntax (e.g. Jade), and some are Java friendly (e.g. Google Closure). All told, we evaluated 26 different templating technologies, scoring them on a variety of factors: DRY, JS library...
more...
- imabonehead
Mongo veritabanı üzerinde tasarımını tıpkı sql veritabanı tasarımı gibi yapıyorsun. Bu da sana bir kolaylık sağlamak yerine koca bir yük getirir. Mongoda (örn:) Post.Owner gibi bişey yapacaksan Post tablosu ve owner tablosu yapmak yerine Post{ prop1, prop2, Owner{ propx,propy}} şeklinde entity tasarladığın için foreign, primary vb key ilişkileriyle değil entity yi doğrudan yekpare tasarlayarak ilerliyorsun. En azından tavsiyeler bu yönde.
- AdemC
MongoDB için herhalde en kötüsü sql gibi harala gürele sorgu çalıştırıp bazı özel sonuçları çabukça alamamak. Native sorgular oluşturmak o kadar da kolay değil. Programınız üzerinde bir altyapı oluşturmaksa ondan da zor.
- AdemC
Yani ama User'i post'un icine koyarsam diger postlar bizde niye yok demez mi?
- AlpB.
mongodb kullanmaya ihtiyacin yok. Bir tablo olustur, bir kolon blob olsun, json yaz gec. Filtreleme yapmak istedigin alanlardan bir signature cikarma yontemi bul, onu da baska bir kolona yaz. Denormalize edemeyecegin veri icin mongodb kullanman mantikli degil -- gerek yok. Tabi denormalize edecegin kismin degiskenligi de onemli.
- Burcu Dogan
İşin doğrusu user ve post kavramlarını hiiiç düşünmeden yazmıştım onları. Senin örneğindeki kelimleeri rastgele seçip iki kelime (Post ve Owner) kullandım. Tabii ki mantıklı olan Önce User sonra Posts. Eğer ki bu temel yapının dışında (Örnek olarak User{name,pass,avatar,Posts{title,content,Tags{"",""}}}) bir değere ihtiyacın varsa map reduce ile fazladan entity oluşturabilirsin. http://cookbook.mongodb.org/index... adresinde pattern bölümünde basit örnekler var.
- AdemC
Mantikli gorunuyor soyledikleriniz epey. RDBMS ile achieve edilip edilemeyecegini epey bi arastirmam gerekecek galiba ve mapreduce'dan faydalanmak icin de dogal olarak native query kasmam gerekecek epey. Degerli yorumlardi bunlar, tesekkurler.
- AlpB.
tam tersi, model tasarimini relational-sql'den bagimsiz dusunerek yapmak gerekiyor ki bu en zor kismi. denormalized yapida amac daha kolay sorgu degil, minimum sorgu. o nedenle sirf baska collection'i (tablo'yu) sorgulamak zorunda olmamak icin, veri tekrari yapmayi tercih edebiliyorsun. ozet; "up to your business requirements"...
more...
- kirpit
burcu'nun dedigi yapi su yapi: http://bret.appspot.com/entry.... benzeri bir yapiyi buyukce bir projede kullanmistik, arama/filtreleme isleri icin solr gibi alternatiflere bakmak gerekiyor tabi. where'leri de oraya yazacaksin.
- moss
Kirpit, look at Solr abi :) Neyi ne kadar querylemek istediginizi bilmeden bir yorum yapmak kolay degil aslinda. Oyle seyler olabiliyor ki, sadece kronolojik okuyup basip gecmek istiyorsun. Bu data ile ne yapmak istedigini acikca yazman lazim Alp.
- Burcu Dogan
topic'teki sorulara donersek: 1. mongodb icin orm kullanma, 2. user'in postlarini saklayis seklin mantikli geldi bana. bir dokumanin maximum limiti 8 MB idi tabi, text datalar icin kocaman bir sey bu. bir kullanıcı birkac milyar post atmazsa sorun olmaz herhalde. ama aklinda olsun. 3. m2m yapisini da 6hax'te verdigin user <-> post iliskisi ile ayni sekilde yapabilirsin. 4. index'ler ensureIndex() ile yaratiliyor zaten mongo'da. zaten oyle bir index varsa, yeni bir index yapmaz.
- moss
Solr icin tek sema, tek index yap. Solr'a atacagin entitylerin PKlerini globalde tek bir yerden uret -- bir SEQ ile. Entegrasyon icin herhangi bir Solr clientiyla poolingli async bir mekanizma gelistir.
- Burcu Dogan
Cok tesekkurler guzel fikirler gibi gorunuyor. Aslinda kullanim alanimi Facebook gibi dusunebiliriz. Yani her post'un cok farkli metadata'si var. Kimi check-in, kimi status update, kimi app post. O yuzden rdbms'te farkli schema gerekir diye dusundum. Bret'in yazisi tamamen aklimdan cikmis, tesekkurler @moss.
- AlpB.
MVC ile kullanmayi arzu edersen Lithium (li3) framework isini oldukca kolaylastirir. Ilerde model iliskilerini de planliyorlar. PHP 5.3+ lithify.me
- Hidayet Doğan
Son dakikada Bret'in yontemini uygulamaya baslamaktan vazgecip MongoDB definitive guide kitabini okudum bastan sona. Oldukca begendim ve ihtiyaclarimi oldukca karsiliyor gibi gorunuyor. Java driver konusunda cekincelerim vardi, ilk ve en ciddi uzerine titredikleri drivermis. Mesela connection pool falan nasi yaparim diyodum, megersem default geliyormus ve thread-safe imis. Onun da...
more...
- AlpB.
Yok canim bret'e laf yok zaten de bret'tekinin problemi inner index'leri falan manage etmek vb. kolay olmaz, ayrica sharding/replication vb mysql'de cocuk oyuncagi degil, e biz de cocuk olunca.
- AlpB.
Onu ben de yapmistim gecen sene ama yeterli olmuyor. Mi ekini bitisik yazip baglacla cumleye baglamayi ve nokta koymadan bitirmeyi, kelimeleri hatali yazmayi, hatta normal isimlere gelen durum ve cogul eklerini boslukla ayirmayi bile denedim, yok, yine olmuyor vallahi.
- AlpB.
@buremba'nin verdigi linkte yazanlarin (bir cogu eski surumlere ozgu olsa da) genel prensipte dogru. MongoDB yuksek write/read ratio altinda tek kelimeyle sicar, ki bizzat test ettim ve sicarttim. kullananlar, ki buna ben dahil, cok memnun cunku ciddi bir write olayimiz yok. en fazla 50-100 gb, bilemedin bir kac tb datamiz var ve genelde read-heavy sistemlere ihtiyac duyuyoruz. bu yonde...
more...
- kirpit
valla kendi gozlerimle gordum write'a biraz abaninca mongod'nin hunharcasina uctugunu.. ben yine de kritik verileri tutmak icin 2 kere dusunurdum mongo kullanip kullanmamayi..
- kirpit
benim use case'imde max 5 qps falan olacak buyuk ihtimalle cok uzun sure. o yuzden ciddi bir sorun olacagini sanmiyorum ya.
- AlpB.
tam sohbetin uzerine indexer cron'um gene abanmis ve mongo filan kalmamis ortalarda..
- kirpit
eheheuheuhe :D replication ve sharding icin ne yapiyosun?
- AlpB.
"Netflix has been rolling out the Apache Cassandra NoSQL data store for production use over the last six months. As part of our benchmarking we recently decided to run a test designed to validate our tooling and automation scalability as well as the performance characteristics of Cassandra. Adrian presented these results at the High Performance Transaction Systems workshop last week."
- imabonehead
from Bookmarklet
"Recently at Surge 2011, the annual conference on scalability and performance, Google's CIO Ben Fried gave an illuminating keynote address. His main insight was that generalists are the people that will lead engineering teams in successfully scaling the web."
- imabonehead
from Bookmarklet
"On Sept 22nd, 2011 at the first Twilio <Conference> we had the opportunity to share details on how the engineering team has scaled up the Twilio infrastructure and organization over the past three years. Embedded below are the slides from the talk."
- imabonehead
from Bookmarklet
"With more than 25 photos & 90 likes every second, we store a lot of data here at Instagram. To make sure all of our important data fits into memory and is available quickly for our users, we’ve begun to shard our data—in other words, place the data in many smaller buckets, each holding a part of the data."
- imabonehead
from Bookmarklet
"Our application servers run Django with PostgreSQL as our back-end database. Our first question after deciding to shard out our data was whether PostgreSQL should remain our primary data-store, or whether we should switch to something else. We evaluated a few different NoSQL solutions, but ultimately decided that the solution that best suited our needs would be to shard our data across a set of PostgreSQL servers."
- imabonehead
Looking For The Source Code Of Life, LINUX and MORE...: Building A Highly Available Linux Cluster Using Wackamole - http://blog.adityapatawari.com/2011...
"Wackamole is an application which manages a bunch of IPs which should be accessible from outside all the time. Given a set of machines and a IPs, wackamole will ensure that if any machine goes down, other machine will take up its IP almost instantly and outside world will see no impact. It tries to balance the number of IPs across the number of machines available. Wackamole uses Spread network messaging system."
- imabonehead
from Bookmarklet
son cümle pek derin düşüncelere dalmamı sağladı: "Would “developer-driven culture” work at your company?" :)
- gkhn
sentez çıktı mı derin düşüncelerinden @gkhn :)
- İnanç Gümüş
çıkmasına gerek kalamadan Türkiye'nin gerçekleri her gün yaptığı gibi sarsıp kendime getirdi. :) ve cevap da maalesef çok keskin hatlı bir "HAYIR" :)
- gkhn
sonuç galiba biraz pesimist biraz gerçekçi olmuş... :p
- İnanç Gümüş
o zaman biraz yumuşatayım sonucu; Türkiye'deki şirketler henüz bu kültüre hazır değil maalesef :)
- gkhn
Yalnızca şirketler mi hazır değil, deneyimime göre çalışanlar da hazır olmayabilir... :(
- İnanç Gümüş
konu dışı-> blogun tasarımı mserdark'nın blogundaki temayla aynı.
- erd
yıllardır bu sektörde memur zihniyeti ile çalıştırılanlar da henüz hazır değildir tabii ki. oluşmuş böyle bir kültür olmadığındandır o da. bu kültürü uygulamaya koyan şirketler de bulunursa piyasada çabuk adapte olacak çalışan çok olacaktır. tabii ki tüm şirketlerin böyle olmasını beklememek gerek. adaptasyon zorluğu yaşayacak çalışan için de alternatif kalsın.
- gkhn
@gkhn "adaptasyon zorluğu yaşayacak çalışan için de alternatif kalsın" süper :)
- İnanç Gümüş
"Graylog2 is an open source log management solution that stores your logs in MongoDB. It consists of a server written in Java that accepts your syslog messages via TCP, UDP or AMQP and stores it in the database. The second part is a web interface that allows you to manage the log messages from your web browser. Take a look at the screenshots or the latest release info page to get a feeling of what you can do with Graylog2."
- imabonehead
from Bookmarklet
"Hazelcast and GridGain are the best choice for an easily-parallelized, low-data, CPU-intensive tasks. Moreover, they are even better choice, when some unexpected node failures can happen."
- İnanç Gümüş
Deferred Deletes is a technique where deleted items are marked as deleted but not garbage collected until some days or preferably weeks later. James Hamilton talks describes this strategy in his classic On Designing and Deploying Internet-Scale Services: Never delete anything. Just mark it deleted. When new data comes in, record the requests on the way. Keep a rolling two week (or more) history of all changes to help recover from software or administrative errors. If someone makes a mistake ...
This is a guest a post by Alvaro Videla describing their architecture for Poppen.de, a popular German dating site. This site is very much NSFW, so be careful before clicking on the link. What I found most interesting is how they manage to sucessfully blend a little of the old with a little of the new, using technologies like Nginx, MySQL, CouchDB, and Erlang, Memcached, RabbitMQ, PHP, Graphite, Red5, and Tsung. What is Poppen.de? Poppen.de (NSFW) is the top dating ...
The 2009 presidential election was the most closely monitored election in US history, thanks in part to the efforts of tech entrepreneurs like David Troy. Troy speaks at the 2009 Emerging Communications Conference about how he and his teams were able to create the Twitter Vote Report, which allowed people to report on poll conditions, and the Inauguration Report 09, a first hand documentation of peoples experiences at the 2009 presidential election.
If you were pestered with a dozen or more phone calls a day from people in your locality asking what was on your menu for the day, or inquiring about the delivery of their order, and you were definitely not the local Thai Kitchen but just an innocent guy or gal, what would you do? Well, youd listen to Danny Sullivan in this program. Danny discusses why local search is such a difficult task, and what some of the major ...
PowerPivot, a new add-in for Excel, can absorb and analyze vast quantities of data. And it can ingest that data from sources that support that Atom-based OData protocol. John Hancock, who led the charge to create PowerPivot, tells host Jon Udell how it works, why it supports OData, and what this will mean not only for corporate business intelligence but also for the analysis of open public data.
Dr. Moira Gunn chats with author and science journalist, Shankar Vedantam about his new book, The Hidden Brain: How Our Unconscious Minds Elect Presidents, Control Markets, Wage Wars, and Save Our Lives, touching on how the brain functions automatically during times of heightened emotion.
Some pundits have declared the iPad a “toy”, or suited only for content consumption. I disagreed with the latter a few days ago, and I think it will have some very interesting business uses. If I come up with a sufficiently novel one, Oasis Digital might implement it. But sadly I must mostly agree, for at [...]
Like 300,000 other people, I have a shiny new iPad in hand. Here is my very short review: Apple is going to sell a huge pile of these things. Expanding on that Lots of other companies are going to sell an enormous pile of apps for the iPad. From the point of view of a maker of software, [...]
In A Brief History of the Internet it was revealed that the Internet was based on the idea that there would be multiple independent networks of rather arbitrary design. The Internet as we now know it embodies a key underlying technical idea, namely that of open architecture networking. In this approach, the choice of any individual network technology was not dictated by a particular network architecture but rather could be selected freely by a provider and made to interwork with ...
In this presentation from eComm 2009, LiMos Open Source Committee Chairman, David Lefty Schlesinger discusses the meaning of governance, and the advantage of LiMos approach over those of Google and Apple for their Android and iPhone application development platforms, before opening the floor to questions.
Today Amazon Web Services takes another big step in making it easier to migrate legacy storage systems to the cloud through AWS Import/Export support for ingesting Punch Cards. AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage media for transport. Punch cards are paper-based storage media that represent data using the presence or absence of holes in specific positions. With AWS Import/Export for Punch Cards, enterprises can begin using the service to preserve ...