I have queries inside a cfloop that makes the process very slow. Is there a way to make this query faster?
<cfquery name="GetCheckRegister" datasource="myDB"> SELECT * FROM CheckRegister, ExpenseType Where PropertyID=10 and ExpenseType.ExpenseTypeID=CheckRegister.ExpenseTypeID </cfquery> <CFOUTPUT query=GetCheckRegister> <cfquery name="GetVendorName" datasource="myDB"> SELECT * FROM Vendors WHERE VendorID=#VendorID#</cfquery> <!--- I use the vendor name here ---> <cfset local.CreditDate = "" /> <cfquery name="getTenantTransactionDateFrom" dataSource="myDB"> Select TenantTransactionDate as fromDate From TenantTransactions Where CheckRegisterID = #CheckRegisterID# Order By TenantTransactionDate Limit 1 </cfquery> <cfquery name="getTenantTransactionDateTo" dataSource="myDB"> Select TenantTransactionDate as ToDate From TenantTransactions Where CheckRegisterID = #CheckRegisterID# Order By TenantTransactionDate desc Limit 1 </cfquery> <cfif getTenantTransactionDateFrom.fromDate neq "" AND getTenantTransactionDateTo.ToDate neq ""> <cfif getTenantTransactionDateFrom.fromDate eq getTenantTransactionDateTo.ToDate> <cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') /> <cfelse> <cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') & " - " & DateFormat(getTenantTransactionDateTo.ToDate, 'mm/dd/yyyy') /> </cfif> </cfif> <!--- I use the local.CreditDate here ---> <!--- Here goes a table with the data ---> </CFOUTPUT>
cfoutput works like a loop.mysqlsqlcoldfusion
It has been a long time since I did any ColdFusion development but a common rule of thumb would be to not call queries within a loop. Depending on what you are doing, loops can be considered an RBAR (row by agonizing row) operation.
You are essentially defining one query and looping over each record. For each record, you are doing three additional queries aka three additional database network calls per record. The way I see it, you have a couple options:
getTenantTransaction...and see if performance improves. This reduces the RBAR database calls from three to two.
Using the main query and the loop to process the data could be faster if:
SELECT VendorID, CheckRegisterId, ... FROM CheckRegister, ExpenseType ...
SELECT VendorID, CheckRegisterId, VendorName ... FROM CheckRegister, ExpenseType, Vendors ...
GetVendorNameusing a valid VendorId
getTenantTransactionDateFromusing a valid CheckRegisterID
getTenantTransactionDateTousing a valid CheckRegisterID
TOTAL = ROWS * (TIME_V+ TIME_TD1 + TIME_TD2).
RESULT = 10000 * (30 + 15 + 15) = 10000 * 60 = 600000 (ms) = 600 (sec) = 10 min
So, for 10000 rows, one milisecond of the loop results in 10 seconds added to the process.
When you have many resulting rows for the main query, you need to minimize the query time of each element in the loop. Each milisecond affects in the performance of the loop. So you need to make sure there are the right indexes for each field filtered for each query in the loop.
You always want to avoid having queries in a loop. Whenever you query the database, you have roundtrip (from server to database and back from database to server) which is slow by nature.
A general approach is to bulk data by querying all required information with as few statements as possible. Joining everything in a single statement would be ideal, but this obviously depends on your table schemes. If you cannot solve it using SQL only, you can transform your queries like this:
GetCheckRegister... (no loop) <cfquery name="GetVendorName" datasource="rent"> SELECT * FROM Vendors WHERE VendorID IN (#valueList(GetCheckRegister.VendorID)#) </cfquery> <cfquery name="getTenantTransactionDateFrom" dataSource="rent"> Select TenantTransactionDate as fromDate From TenantTransactions Where CheckRegisterID IN (#valueList(GetCheckRegister.CheckRegisterID)#) </cfquery> etc.
valueList(query.column) returns a comma delimited list of the specified
column values. This list is then used with MySQL's
IN (list) selector to retrieve all records that belong to all the listed values.
Now you would only have a single query for each statement in your loop (4 queries total, instead of 4 times number of records in
GetCheckRegister). But all records are clumped together, so you need to match them accordingly. To do this we can utilize ColdFusion's Query of Queries (QoQ), which allows you to query on already retrieved data. Since the retrieved data is in memory, accessing them is quick.
GetCheckRegister, GetVendorName, getTenantTransactionDateFrom, getTenantTransactionDateTo etc. <CFOUTPUT query="GetCheckRegister"> <!--- query of queries ---> <cfquery name="GetVendorNameSingle" dbType="query"> SELECT * FROM [GetVendorName] WHERE VendorID = #GetCheckRegister.VendorID# </cfquery> etc. </CFOUTPUT>
You basically moved the real queries out of the loop and instead query the result of the real queries in your loop using QoQ.
Regardless of this, make sure your real queries are fast by profiling them in MySQL. Use indices!
As others have said, you should get rid of the loop and use joins. Looking at your inner loop, the code retrieves the earliest and latest date for each CheckRegisterID. Instead of using LIMIT, use aggregate functions like MIN and MAX and GROUP BY CheckRegisterID. Then wrap that result in a derived query so you can join the results back to CheckRegister ON id.
Some of the columns in the original query aren't scoped, so I took a few guesses. There's room for improvement, but something like is enough to get you started.
-- select only needed columns SELECT cr.CheckRegisterID, ... other columns FROM CheckRegister cr INNER JOIN ExpenseType ex ON ex.ExpenseTypeID=cr.ExpenseTypeID INNER JOIN Vendors v ON v.VendorID = cr.VendorID LEFT JOIN ( SELECT CheckRegisterID , MIN(TenantTransactionDate) AS MinDate , MAX(TenantTransactionDate) AS MaxDate FROM TenantTransactions GROUP BY CheckRegisterID ) tt ON tt.CheckRegisterID = cr.CheckRegisterID WHERE cr.PropertyID = 10
I'd highly recommend reading up on JOIN's as they're critical to any web application, IMO.
You should get all of your data in one query, then work with that data to output what you want. Multiple connections to a database almost always be more resource-intensive than getting the data in one trip and working with it. To get your results:
Initial Schema Setup:
CREATE TABLE CheckRegister ( checkRegisterID int, PropertyID int, VendorID int, ExpenseTypeID int ) ; CREATE TABLE ExpenseType ( ExpenseTypeID int ) ; CREATE TABLE Vendors ( VendorID int ) ; CREATE TABLE TenantTransactions ( checkRegisterID int, TenantTransactionDate date, note varchar(20) ); INSERT INTO CheckRegister ( checkRegisterID, PropertyID, VendorID, ExpenseTypeID ) VALUES (1,10,1,1),(1,10,1,1),(1,10,2,1),(1,10,1,2),(1,5,1,1),(2,10,1,1),(2,5,1,1) ; INSERT INTO ExpenseType ( ExpenseTypeID ) VALUES (1), (2) ; INSERT INTO Vendors ( VendorID ) VALUES (1), (2) ; INSERT INTO TenantTransactions ( checkRegisterID, TenantTransactionDate, note ) VALUES (1,'2018-01-01','start') , (1,'2018-01-02','another') , (1,'2018-01-03','another') , (1,'2018-01-04','stop') , (2,'2017-01-01','start') , (2,'2017-01-02','another') , (2,'2017-01-03','another') , (2,'2017-01-04','stop') ;
SELECT cr.* , max(tt.TenantTransactionDate) AS startDate , min(tt.TenantTransactionDate) AS endDate FROM CheckRegister cr INNER JOIN ExpenseType et ON cr.ExpenseTypeID = et.ExpenseTypeID INNER JOIN Vendors v ON cr.vendorID = v.VendorID LEFT OUTER JOIN TenantTransactions tt ON cr.checkRegisterID = tt.CheckRegisterID WHERE cr.PropertyID = 10 GROUP BY cr.CheckRegisterID, cr.PropertyID, cr.VendorID, cr.ExpenseTypeID
| checkRegisterID | PropertyID | VendorID | ExpenseTypeID | startDate | endDate | |-----------------|------------|----------|---------------|------------|------------| | 1 | 10 | 1 | 1 | 2018-01-04 | 2018-01-01 | | 1 | 10 | 1 | 2 | 2018-01-04 | 2018-01-01 | | 1 | 10 | 2 | 1 | 2018-01-04 | 2018-01-01 | | 2 | 10 | 1 | 1 | 2017-01-04 | 2017-01-01 |
I only added 2 check registers, but CheckRegisterID 1 has 2 vendors and 2 Expense Types for Vendor 1. This will look like repeated data in your query. If your data isn't set up that way, you won't have to worry about it in the final query.
JOIN syntax to get the related data you need. Then you can aggregate that data to get the
toDate. If your data is more complex, you may need to look at Window Functions. https://dev.mysql.com/doc/refman/8.0/en/window-functions.html
I don't know what your final output looks like, but the above query gives you all of the query data in one pass. And each row of that data should give you what you need to output, so now you've only got one query to loop over.