—–Start of Advertisement——-
BUILD CAR POOL SOLUTIONS ON ANY DEVICE TO RUN ANYWHERE (www.mcruiseon.com). Introducing mCruiseOn, the java library /json api’s that you can use to build a car pool solution. Be the next avego.com, carpooling.com, zimride.com. mCruiseOn is your one stop API on EC2.
——End of Advertisement——-
Thats the point right, if something ain’t that easy to understand, how easy will it be to implement and maintain ? The challenge is to understand it in one go.
Memcached is a distributed memory object caching system. No, that is not the way I am going to explain it, so chill.
We all know that our webservers have web applications that run database query’s to get data. They also call query’s to insert, delete and update data. As simple as it sounds, its crazy to get such a system working when millions of connections on the webserver get requests. Webservers need to be added in parallel to handle load, databases need to be replicated so that in parallel data can be accessed, and then starts the complication of ensuring that data is consistent across all the requests. Then comes the challenge of ensuring data is available quickly enough.
Bad examples are http://www.irctc.co.in. But its important to understand how difficult it is to get it right. Especially if your data is as complex as the Indian Railways. If this site is down, the nation stops.
Now, for some basic computer fundamentals. Just as a quick reminder. We all know that tape drives are slower than floppy disks. Floppy disks are slower than hard disks. Hard disks are slower than memory. And memory is slower than CPU memory. CPU memory is much more expensive than memory, and the story continues. Faster the memory the more expensive it will be.
What is serialization ? A object that can be stored on a disk byte by byte, and read back to form that object again, is called a Serializable object.
Now, imagine. If I store a file on a harddisk, and read it. Will it be faster if I read it from the memory ? Obviously. We dont use fread (file read, File.read()) for every operation we want to perform. We first read it into a String, and then read it from that String. Memcached does something similar. It stores all data given to it in memory. So that you can read from it faster as compared to a harddisk.
So, if you have a piece of code doing a lot of read from harddisks, (mind you databases are also stored in big harddisks), AND if that data is needed to be read multiple times, then you can write that data to memcached. So instead of reading data from the harddisk all reads can goto memcached. There by preventing the harddisk from the extra effort for each read. It can do something more important in that time.
Memcached provides you with a way to identify the data that you just wrote with a identifier, a name or a key. We call it key value pair.
When you load memcached, you need to specify the amount of memory that this machine will dedicate for memcached. And memcached reserves that much memory for itself (the -m command). So, what happens when u run out of memory.
Simple load memcached on another machine, and get both these memcached machines to know each other (this is called a cluster). Then you can refer to the cluster and add objects. Memcached will store objects in round robin between the servers on the cluster.
Now, your application needs to identify from its request if similar data is being requested between its clients. If it is, then the first call can save that common data on memcached, and subsequent calls can retrieve the data from memcached.
Just remember, that memcached is a “explicit cache”. Which means you need to add stuff to memcached, remove it, updated it. It does “nothing” automatically. At this point many people get turned off, but the concept of memcached is to save your trip to the database, and reduce the need for clustering a database and webservers. Since clustering memcached is very cheap, as compared to database and webservers.
This should help you understand the fundamentals of memcached.