Ever wanted to have an easy way to get the most recent entry for any given key on a massive database? I ran into this problem a few days ago at work and I wanted to attempt to explain how I solved it.

We have a large database that contains the lifetime history of our trucks GPS locations. The data in these tables is entered once every 10 seconds per truck, given the fact that we have about 30 trucks on average there are about 31536000 entries into this table per year and we have been running this program for several years now. We need a quick way to get the most recent location for every truck on the fleet regardless of whether it was last reported 10 seconds ago or 10 days ago.

For a while we had created a simple SQL query that returned the MAX(date) record for each given truck and this seemed to work all right. That is until yesterday evening, at that time the query that I just mentioned was taking somewhere in the realm of 10-15 minutes to execute, just long enough for our program to time out. Our table now contains somewhere in the realm of two billion records and this query is just not efficient enough. We attempted to optimize the query and gained some performance but not nearly enough to make a sustainable difference in the future.

The program that relies on this query is a real time monitoring system and we cannot sit there and wait for this to process during several minutes. An idea I had was to intercept the incoming stream and tag it as “most-recent” before insertion thus allowing a simple query such as “SELECT * FROM TABLE WHERE MOST-RECENT=true” would work. The challenge with this approach was, the interception of the data. We knew that we could write a simple database trigger, but had no idea if  would allow us to modify the incoming data stream.

Most of the database triggers happen after insertion, or update so this may have posed a problem. However it appears that Microsoft had thought about this for us and gave us “Instead of Insert” triggers. This triggers take the incoming data and allow you to do something with it other than actually inserting it into the database. We decided to take the incoming data, update the table to clear the “most-current” flag and then insert the data while updating the current records most-current flag. See the implementation below, we went from a 10 minute query to a milliseconds long query. And the overhead placed on the insert hasn’t posed a problem for the amount of data we receive.

-- =============================================
-- Author:		Jose C Gomez
-- Create date: 5/10/2010
-- Description:	Trigger that allows us to keep track of the Vehicles most recent location
-- =============================================
CREATE TRIGGER InsteadofInsert
   ON  dbo.current_location
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.

    -- Insert statements for trigger here
UPDATE current_location
SET most_current=0
WHERE most_current =1 AND  Device_ID in (SELECT Device_ID from Inserted);

INSERT INTO current_location
	SELECT Device_ID, Lat, Lon, date, ip_address, direction, speed ,1 
	FROM Inserted


Tags: , , , , ,

2 Comments on Instead of Insert Trigger

  1. Lamar says:

    A simpler method would be to create a current location table that is keyed using the vehicle id and let the other table be your location history table. Then, create a trigger that would fire after the insert into the history table. This trigger would update the vehicle’s current location in the new current location table or create a new row for any new vehicles that have just been added to the system. Of course you would have to modify your program to pull the current location from the new table.

  2. Jose C Gomez says:

    Thanks Lamar cool idea, although I still like this approach better I do appreciate the feedback.