Thursday, November 26, 2009

When you have a hammer, everything looks like a nail.

Once, I have discussed with my colleague architecture candidate of high-performance system. In brief, system should receive a big amount of small incoming requests and provide API for querying on this data. 

The numbers:

  • number of incoming request  - 5000 per second.
  • number of select - 5 per 10 seconds
  • maximum latency time between registering request and it's availability trough API is 5 sec.

My colleague is a DB guy who is working with creating big and rather complex automation solutions. He proposed to take the most enterpricy  (and expensive) database, do all of the processing in db and use Table-Views as external API to the system.

This solutions introduce a lot of problems.

1. Vendor affinity

Once, you exposed a part of your db as external API, you will never change it (nor db vendor, nor db structure).

2. Performance optimization

You have absolutely no control on queries, that will run against your system. You can buy several top-end servers and still some idiot will invent a query that will bog them down. Stored-procedures is a solution, but it will refuse an argument, that exposing views will allow us not to worry about queries in external API.

3. Scalability

It's a fact that SQL server isn't scaling well, horizontally at least.  The system is not so complex, and consistency checks, joins and complex select queries is nor required here.

Better solution

System can be divided in 3 logical parts: Request Receiver, Request Store and External API.

Receiver and external API are pretty simple: protobuf and web-services. Protobuf is fast enough for thin channels of each sending-client and web-service is pretty simple for usage for API consumers.

But the most uncertainty is left to Request Store. As it's organization will be changing during load tests and API modification, it should be hidden under very strict and simple interfaces. It's handy, that system doesn't have complex objects speeded across many tables. We can use POCOs everywhere.

And the storage itself can be implemented using traditional SQL databases or distributed storages. Thanks' to interfaces, it will not affect remaining to parts.

No comments:

Post a Comment