What is Memcached ?

2

Self Advertisement
—–Start of Advertisement——-
BUILD CAR POOL SOLUTIONS ON ANY DEVICE TO RUN ANYWHERE (www.mcruiseon.com). Introducing mCruiseOn, the java library /json api’s that you can use to build a car pool solution. Be the next avego.com, carpooling.com, zimride.com. mCruiseOn is your one stop API on EC2.
——End of Advertisement——-

Thats the point right, if something ain’t that easy to understand, how easy will it be to implement and maintain ? The challenge is to understand it in one go.

Memcached is a distributed memory object caching system. No, that is not the way I am going to explain it, so chill :).

We all know that our webservers have web applications that run database query’s to get data. They also call query’s to insert, delete and update data. As simple as it sounds, its crazy to get such a system working when millions of connections on the webserver get requests. Webservers need to be added in parallel to handle load, databases need to be replicated so that in parallel data can be accessed, and then starts the complication of ensuring that data is consistent across all the requests. Then comes the challenge of ensuring data is available quickly enough.

Bad examples are http://www.irctc.co.in. But its important to understand how difficult it is to get it right. Especially if your data is as complex as the Indian Railways. If this site is down, the nation stops.

Now, for some basic computer fundamentals. Just as a quick reminder. We all know that tape drives are slower than floppy disks. Floppy disks are slower than hard disks. Hard disks are slower than memory. And memory is slower than CPU memory. CPU memory is much more expensive than memory, and the story continues. Faster the memory the more expensive it will be.

What is serialization ? A object that can be stored on a disk byte by byte, and read back to form that object again, is called a Serializable object.

Now, imagine. If I store a file on a harddisk, and read it. Will it be faster if I read it from the memory ? Obviously. We dont use fread (file read, File.read()) for every operation we want to perform. We first read it into a String, and then read it from that String. Memcached does something similar. It stores all data given to it in memory. So that you can read from it faster as compared to a harddisk.

So, if you have a piece of code doing a lot of read from harddisks, (mind you databases are also stored in big harddisks), AND if that data is needed to be read multiple times, then you can write that data to memcached. So instead of reading data from the harddisk all reads can goto memcached. There by preventing the harddisk from the extra effort for each read. It can do something more important in that time.

Memcached provides you with a way to identify the data that you just wrote with a identifier, a name or a key. We call it key value pair.

When you load memcached, you need to specify the amount of memory that this machine will dedicate for memcached. And memcached reserves that much memory for itself (the -m command). So, what happens when u run out of memory.

Simple load memcached on another machine, and get both these memcached machines to know each other (this is called a cluster). Then you can refer to the cluster and add objects. Memcached will store objects in round robin between the servers on the cluster.

Now, your application needs to identify from its request if similar data is being  requested between its clients. If it is, then the first call can save that common data on memcached, and subsequent calls can retrieve the data from memcached.

Just remember, that memcached is a “explicit cache”. Which means you need to add stuff to memcached, remove it, updated it. It does “nothing” automatically. At this point many people get turned off, but the concept of memcached is to save your trip to the database, and reduce the need for clustering a database and webservers. Since clustering memcached is very cheap, as compared to database and webservers.

This should help you understand the fundamentals of memcached.

Advertisements

Designing a Data Model, process and tools to use…

0

Normalization is generally learnt during our college days from database concepts, but we tend to forget the very basic need for normalization as we acquire new knowledge over the years. We ignore normalization to avoid taking time to design our database. I mean, why have a bunch of tables, why not just 1 table with everything in it.

So why normalize our database ?

The answer is simple, databases need to be designed in a way so that the following issues are “prevented” at the root level itself by the design itself.

  1. if a record is deleted, other tables containing the record details should deleted in a cascaded format.
  2. Every table must have exactly 1 “intent”. Customer_Id, Payment_1, Payment_2, Payment_3 is NOT the way to design your database. Simply because retrieving data will be really slow. Now you are considering a few transactions, but in future you will not be given a dedicated server for just this one app for just this small set of users. In the real world, databases, filesystems, storage, network, cpu, memory is all shared. If your one app takes most of the cpu/memory resources, very soon it will be “out” of production.
  3. Performance, every database indexes tables to ensure fast retrieval. What does indexing mean, you ask ? Well, the database reads your table, and creates multiple pointers to your record, so that its not a sequential search. Creates a BTree, to prevent linear search.
  4. Foreign key fundamentals help keep your data “integrity” intact. (extension of point 1). Integrity here means, that your database has GOOD data, not stale not useless. Imagine a database that retains salary information of all employee’s even after they have resigned from the organization. Crazy, yes, its possible if you dont form the correct foreign key relationships.
  5. Oh forget all this, I’ll just take care of it in code. Well, if you take 2 mins to think about it, you will not need to write, test, maintain, bug fix 100-500 lines of code, if you just design your db/tables right.

I am not trying to convince you on the benefits of normalization, I am just reminding you. You are already convinced, but deep in your mind you believe that its needed, just lazy to put in that time. And I understand, database design needs time, thinking time. A step by step process, since it does take your whole brain to design it right.

The right tools play a very important role for designing your database. Remember it takes both the sides of your brain to design a database. So slow down.

I use ubuntu for my development work. For db design we need GUI tools. After some research I have zeroed on mysqlworkbench. It is a nice tool for reverse engineering your table and reviewing the schema graphically. Its quicker, less painful and a reverse engineered from existing tables. Not many tools produce this result.

To change relationships between tables mysqlworkbench sucks big time. Its a pain to use. So for this I use phpmyadmin. Its clumsy at best, but does the job.

So use phpmyadmin to edit relationships and review the change, I reverse engineer with mysqlworkbench.

phpmysqladmin, if you need to create a foreign key in table A, which is a column in table B. You need to create a index for the column in table B. Then Goto table A, “relation view”, and use the drop down to select your column from table B. Also set your “on delete”, “on update” rules.

I hope this is helpful. If not, please comment.

Interesting Links
For understanding workbench er diagram (http://www.smartdraw.com/resources/tutorials/cardinality-notations/)