Roundtrips and the real bottlenecks

Wednesday, November 19, 2003

Now the discussion of yesterday about Dynamic SQL and Stored Procedures are slowing down, I'd like to mention a little thing that popped up during the debate: roundtrips.

A roundtrip is a new request from client to database server with a response from the database server. This takes time, there is no doubt about that. I'll admit that a batch of statements in a stored procedure is easier to work with than a set of statements to create a batch of dynamic sql queries. Some argued that the call to a single stored procedure requires just 1 roundtrip and creating and executing a bunch of dynamic queries would require sometimes more than 1 roundtrip (e.g.: for each dynamic query 1 roundtrip).

This triggered me (pun intended) to think about this further. I wondered: if these people are so focussed on roundtrips and the costs of a roundtrip, are these people also using DataReader.GetBytes() ? Are these people also batching all calls required for, say, a webform, into 1 stored procedure? A wild guess from my side: no I don't think a lot of these people use GetBytes() to do efficient blob retrieval and no they're not batching multiple calls of a page into 1 stored proc. Why are these 2 obvious random features related to roundtrips? Well, focussing on the time it takes a roundtrip over an already opened connection is performed (a few milliseconds to set it up), implies all other bottlenecks in a conversation between client and server are removed already and 'hidden' extra roundtrips are optimized away. I have doubts about that. Performance is important, but if you want to discuss benefits of method A compared to method B, using the argument of the roundtrip is perhaps not the best argument you can pick, because the time a roundtrip may take is far less than for example the time it takes to fetch large blobs in resultsets.

I can imagine a lot of people simply ignore DataReader.GetBytes() and read the complete blob at once, stored procedure or not. This can be very inefficient. When roundtrips concern you because they take time, make certain that you have removed every other bottleneck first, starting with unnecessary fetching of blobdata, after removing the overhead time of using a DataAdapter of course.

When you have optimized your dataretrieval of individual rows and batches of rows, there is another roundtrip issue to be discussed: when for example a webform requests 3 or 4 batches of data to build the webform's content, are these requests bundled into 1 call? If not, you have 3 or 4 roundtrips to the server, while there could be 1. When you are concerned about roundtrips and the time it takes to make one, you definitely shouldn't leave these 2 or 3 unnecessary requests go unnoticed and optimize them away. After all: a roundtrip seems to be important.

Ah, you can't? Because that would interfer GUI design with DAL implementation, it would make the DAL be tight to the GUI's layout? That's right. You can't, or better: you shouldn't. A DAL is a generic service layer, which shouldn't have any GUI-related design aspects, after all: changing the GUI shouldn't affect the DAL. That makes the aspect of talking about roundtrips a little silly don't you think? Focussing on 1 or 2 extra roundtrips for Dynamic SQL statements (which can be batched as well, but that aside) as being a 'disadvantage', while roundtrips which are obvious are not optimized away because of the generic approach of layered design, doesn't seem to me a valid argument. OR you optimize the cr*p out of every line of code and remove every roundtrip there is OR you accept roundtrips exist and deal with the other bottlenecks to keep the performance as high as possible. You can't do both because optimization requires compromises on the design and vice versa.

I'm all for discussions with fair arguments. To me, using 'the amount of roundtrips' for technique A compared to technique B isn't a fair argument, because it is irrelevant when other aspects of a software system is taken into account. Optimization doesn't start with optimizing tiny loops. It starts with optimizing the general algorithms used.

9 Comments

round trips and sql and optimal design:

IMHO....

any design that has the "model" of a conversation (talker,listener and so on)

can have this design issue and should be examined with the same basic logic, adpated to the details of what is implimented.

you can swap sql "round trip" issues with HTTP exchanges, trips in a car and other cases of data exchange.

it all comes back to :

1) how much traffic (total bytes,people,cars etc)

2) over what time range (clock cycles, minutes, days etc)

3) what is the "Latency" induced per trip

( time taken to get there and back + time spent on avg at "there" )

an example is some of the talk about how a "chatty" web service is often a poor design when running over http, to many small messages

sucking up time and bandwidth.

so if you are going to need one data item say per hour then the "tound trip" time is small and not critical.

but if you need say 50 data points every say 1/10'th of a second then fetching them one-at-a-time means that the latency will start to be larger than the actual data you get ....

so then it's time to move to a model where you get back a number of data points in a batch.

based on that plus the 3 items above I think one can start to build a general model for when to look at trip time as the issue to optimize Vs. time spent waiting at one end or the other for the data

0. the client

1. send request

2. server processing time

3. send reply

4. the client has data

if 0 and 4 are not significant issues then 1-3 each should be tested for how much they add to the total trip time.

make them each small and as close to the same as you can when trip time is a problem.

thats my view anyway.

denny - Wednesday, November 19, 2003 2:05:00 PM

Hi Frans,

Typical, I'm away at a conference for one day and you start such an interesting discussion which I now enter so late.

:-)

I have to confess that I haven't had time to read up on all the comments from yesterday, so with the risk of repeating what have already been said, here's a couple of comments.

Something I'm worried about is to have long transactions and one way of shorten them, is to reduce the number of roundtrips within them.

Another thing is that I like to use stored procedures for persistence logic, especially the CUD part of CRUD. An example of this is if one class is split over several tables.

I guess the only answer here is as usual that "it depends". Words like "always" or "never" always - I mean often<g> - triggers an alarm with me.

:-)

Best Regards,

Jimmy

###

Jimmy Nilsson - Wednesday, November 19, 2003 3:28:00 PM

Hey Jimmy! :) (totally forgot to reply your mail, will do that a.s.a.p, tomorrow I think)

the CUD procs should be written by hand then, to have full control over how they're working, and when the schema changes, how to update them. Not a great solution, IMHO :)

Frans Bouma - Wednesday, November 19, 2003 4:12:00 PM

Or you could re-generate the sprocs after having changed the mapping information in some metadata.

Best Regards,

Jimmy

###

Jimmy Nilsson - Wednesday, November 19, 2003 4:17:00 PM

I agree with you.

That said, using an O/R mapper with just lazy loading seems to be the worst possible case for roundtrips...

peter - Wednesday, November 19, 2003 4:22:00 PM

There are some other reasons you would want to definitely minimize round-trips. It has to do with CPU load on the database server. The DB server is a shared resource(and a very precious one) so you definitely want to minimize the number of calls. Chatty DA layers can be a big drain on DB CPU during heavy load. In our system we have 1000's of concurrent users. We had to do a lot of optimizations to reduce DB roundtrips. For example, when there is a shift change, hundreds of call center reps sign on at the same time. This code path had to be tuned heavily to not pound the database. Much of the tuning involved pushing the data access code into stored procedures and getting much of it in one hit. Even though we returned the same data, we were able to cut CPU utilization by over half.

-Scott

sprughattelutiondotcom

Scott Prugh - Wednesday, November 19, 2003 7:54:00 PM

For those people who worries about the GUI being tightly coupled with the DAL layer, you can design your system to talk to another layer (DataProvider), whose job is only to batch the dynamic SQL and provide interfaces to the controls in the GUI layer. GUI -> (BAL) -> DataProvider -> DAL -> DB.

Rolly Ferolino - Thursday, November 20, 2003 7:23:00 PM

I retrieve lookup data which is not changing a lot at application startup and cache it, in for example the Application object in ASP.NET

Blob retrieval from a db is not slower than from a FS, in fact, the FS in a database is perhaps faster, because it is optimized for data-retrieval. Having blobs in the database can be handy because they get backupped in one go with the database. Also, when you store links to a file instead of the file, the file can be removed/renamed, without violating a constraint. This can hurt your application.

Frans Bouma - Saturday, November 22, 2003 5:24:00 PM

Hi,

Your blog contains good info. Keep it up.

John - Sunday, December 14, 2003 1:18:00 PM

Comments have been disabled for this content.