Sunday, March 25, 2012

Calling sp_oa* in function

I'm faced with a situation where I will need to calculate a column for
a resultset by calling a component written as a VB6 DLL, passing
parameters from the resultset to the component and setting (or
updating) a column with the result. I thought that perhaps the best
way out would be to create a UDF that passes the parameters to the VB
component using the sp_oa* OLE stored procs.

For a test, I created an ActiveX DLL in VB6 (TestDLL) with some
properties and methods. I then created a function that creates the
object, sets the required properties and returns a result. I use
sp_oaDestroy at the end of the function to remove the object
reference. The function seems to work surprisingly well except for a
small problem; when I use the function to calculate a column for a
resultset with more that one row, the DLL appears to stay locked up
("the file is being used by another person or program"). This leaves
me with the impression that the object reference is not being
destroyed. I have to stop/restart the SQL Server in order to free the
DLL.

Question:
Is the UDF approach the best way? I don't like the idea of creating
and destroying the object at every pass which is what the UDF does.
As an alternative, I suppose that I could have a single SP where I
create the OLE object once, loop through the result set with a cursor
and do my processing/updating, then close the OLE object. I must say
that I'm not too fond of that approach either.

Thanks for your help,

Bill E.
Hollywood, FL

(code is below)
___________________________
--Test the function
Create Table #TestTable(Field1 int)
INSERT INTO #TestTable VALUES (1)
INSERT INTO #TestTable VALUES (2)
SELECT Field1, dbo.fnTest(Field1,4) AS CalcCol
FROM #TestTable
Drop Table #TestTable

___________________________

CREATE FUNCTION dbo.fnTest
/*
This function calls a VB DLL

*/
(
--input variables
@.intValue1 smallint,
@.intValue2 smallint
)
RETURNS integer
AS
BEGIN
--Define the return variable and the counter
Declare @.intReturnValue smallint
Set @.intReturnValue=0

--Define other variables
Declare @.intObject int
Declare @.intResult int
Declare @.intError int

Set @.intError=0

If @.intError = 0
exec @.intError=sp_oaCreate 'TestDLL.Convert', @.intObject OUTPUT
If @.intError = 0
exec @.intError = sp_OASetProperty @.intObject,'Input1', @.intValue1
If @.intError = 0
exec @.intError = sp_OASetProperty @.intObject,'Input2', @.intValue2
If @.intError = 0
exec @.intError = sp_oamethod @.intObject, 'Multiply'
If @.intError = 0
exec @.intError = sp_oagetproperty @.intObject,'Output',
@.intReturnValue Output
If @.intError = 0
exec @.intError = sp_oadestroy @.intObject
RETURN @.intReturnValue

ENDBill Ehrreich (billmiami2@.netscape.net) writes:
> Question:
> Is the UDF approach the best way? I don't like the idea of creating
> and destroying the object at every pass which is what the UDF does.
> As an alternative, I suppose that I could have a single SP where I
> create the OLE object once, loop through the result set with a cursor
> and do my processing/updating, then close the OLE object. I must say
> that I'm not too fond of that approach either.

While the UDF may give you slicker SQL code, I would definitely recommend
the stored-procedure approach, as this appears to be a lot more effective.
After all, using a scalar UDF in a set-based query, more or less converts
it to a cursor behind the scenes, so the difference is not that large.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks Erland.

I have found UDFs to be very appealing in that I can seemingly package
functionality into a nice reusable box--a little like what we all
strive to do in object oriented programming. However, I suppose that
I should be much more conservative in using UDFs.

Do you have any good tips (guidelines) for when and when not to use a
scalar-valued UDF or table-valued UDF?

Also, while you say that the calculation of the UDF is a bit like a
cursor behind the scenes (which makes sense), how is it different from
an expression (calculated colummn) calculated on one or more columns
of a query? For example,

SELECT Column1, Column2, Column1*Column2+3 AS CalculatedColumn
FROM tbl1

or even

SELECT Column1, Column2, CASE WHEN Column1>20 THEN 1 ELSE 0 END AS
CalculatedColumn
FROM tbl1

How does the SQL Engine process this logic?

Thanks,

Bill|||Bill Ehrreich (billmiami2@.netscape.net) writes:
> I have found UDFs to be very appealing in that I can seemingly package
> functionality into a nice reusable box--a little like what we all
> strive to do in object oriented programming. However, I suppose that
> I should be much more conservative in using UDFs.
> Do you have any good tips (guidelines) for when and when not to use a
> scalar-valued UDF or table-valued UDF?

First of all, the performance problem is *only* with scalar-value
functions.

Table-valued functions are of two kinds. Inline functions are in fact
not really functions at all, but macros. That is, the optimizer will
consider the expanded query and may recast computation order. (As long
as it does not affect the final result of course). A multi-step function
is like first loading a temp table, and then use that temp table in a
query. (Except that there is no statistics, so the optimizer will have
to guess.)

But scalar functions can really wreck performance. The one guideline I
have is simple: benchmark!

Generally, if you have

SELECT col1, col2, dho.udf(col3, col4)
FROM tbl
JOIN tbl2 ...

and there are two million rows in each table, but the query only hits
10 rows, then the UDF is not likely to be a problem. But if you have:

SELECT *
FROM tbl
WHERE dbo.udf(col1) = @.value

not only do you get the cost of a table scan, but the query also gets
serialized.

> Also, while you say that the calculation of the UDF is a bit like a
> cursor behind the scenes (which makes sense), how is it different from
> an expression (calculated colummn) calculated on one or more columns
> of a query? For example,

In the latter case, SQL Server does not have to build a call stack and
all that.

Again, the best way to compare is to benchmark.

The good news is that in SQL 2005, Microsoft has addressed several of
these issues, and the cost of a UDF is not as severe there. In fact for
a complex expression, a UDF in written a CLR language may be faster than
the corresponding expression using built-in T-SQL functions.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland,

I did a little test case in my query designer on a table with just
under 300,000 rows as follows:

--Scenario A - Query with calculated column--
SELECT TestAssignmentID,
CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END AS CalcColumn
FROM TestAssignment

--Scenario B - Query with calculated column as criterion--
SELECT TestAssignmentID,
CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END AS CalcColumn
FROM TestAssignment
WHERE CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END=1

--Scenario C - Query using scalar UDF--
SELECT TestAssignmentID,
dbo.fnIsEven(TestAssignmentID) AS CalcColumn
FROM TestAssignment

--Scenario D - Query using scalar UDF as crierion--
SELECT TestAssignmentID,
dbo.fnIsEven(TestAssignmentID) AS CalcColumn
FROM TestAssignment
WHERE dbo.fnIsEven(TestAssignmentID)=1

ALTER FUNCTION dbo.fnIsEven
(
@.intValue int
)
RETURNS bit
AS
BEGIN
Declare @.bitReturnValue bit

If @.intValue % 2 = 0
Set @.bitReturnValue=1
Else
Set @.bitReturnValue=0
RETURN @.bitReturnValue
END

--Scenario E - Cursor but no resultset--
Declare @.intCurrentValue int
Declare @.bitIsEven bit
Declare @.crsrTest cursor

Set @.crsrTest = cursor for
SELECT TestAssignmentID FROM TestAssignment

Open @.crsrTest
Fetch Next From @.crsrTest INTO @.intCurrentValue

While @.@.Fetch_Status = 0
Begin
If @.intCurrentValue % 2=0
Set @.bitIsEven=1
Else
Set @.bitIsEven=1

Fetch Next From @.crsrTest INTO @.intCurrentValue
End

Close @.crsrTest
Deallocate @.crsrTest

--Scenario F - Cursor with resultset, 30,000 out of 295,310 rows
set nocount on
Declare @.intCurrentValue int
Declare @.crsrTest cursor

SELECT TestAssignmentID, Null AS CalcColumn
INTO #Temp
FROM TestAssignment
WHERE TestAssignmentID<30000

Set @.crsrTest = cursor for
SELECT TestAssignmentID
FROM #Temp

Open @.crsrTest
Fetch Next From @.crsrTest INTO @.intCurrentValue

While @.@.Fetch_Status = 0
Begin
If @.intCurrentValue % 2=0
UPDATE #Temp SET CalcColumn=1 WHERE
TestAssignmentID=@.intCurrentValue
Else
UPDATE #Temp SET CalcColumn=0 WHERE
TestAssignmentID=@.intCurrentValue

Fetch Next From @.crsrTest INTO @.intCurrentValue
End

Close @.crsrTest
Deallocate @.crsrTest

SELECT * FROM #Temp
DROP TABLE #Temp

--Results--
Scenario Time(ms)
A 1608
B 940
C 4091
D 4535
E 8773
F 52946

Note that the column TestAssignmentID is an integer type.

The scalar UDF was significantly slower than the calculated column,
but it didn't seem to move as slowly as the cursor in E. Using the
UDF in the WHERE clause increased processing time, but not as badly as
I was expecting. In contrast, using the calculated column in the
WHERE clause decreased processing time.

In F, I was trying to get a resultset with the cursor in the fastest
possible way (maybe you could suggest a faster way?) but that took
almost a minute for only 30,000 rows. Scenarios E and F seem to be
telling me that the cursor itself isn't so bad; it's creating a
resultset with successive INSERT or UPDATE statements that is by far
the most costly.

Is this a reasonable test? Did you see what you expected to see?

Bill|||Bill Ehrreich (billmiami2@.netscape.net) writes:
> The scalar UDF was significantly slower than the calculated column,
> but it didn't seem to move as slowly as the cursor in E. Using the
> UDF in the WHERE clause increased processing time, but not as badly as
> I was expecting. In contrast, using the calculated column in the
> WHERE clause decreased processing time.
> In F, I was trying to get a resultset with the cursor in the fastest
> possible way (maybe you could suggest a faster way?) but that took
> almost a minute for only 30,000 rows. Scenarios E and F seem to be
> telling me that the cursor itself isn't so bad; it's creating a
> resultset with successive INSERT or UPDATE statements that is by far
> the most costly.
> Is this a reasonable test? Did you see what you expected to see?

Yes, this is a good test. The one thing I would have done different, is
that I would have made the cursor INSENSITIVE (and I would not have used
a cursor variable). INSENSITIVE is not likely to have any signficant impact,
but I always go with INSENSITIVE, since the default keyset-driven cursors
have sometimes given me completely horrible query plans. (In SQL 6.5, but
I'm not taking a chance that things have changed.)

Yes, the times are about what I would expect.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland,

I never use a cursor if I can help it--I've been trained to avoid them.
I would normally use a SELECT loop as in

--
Declare @.intCurrentValue int
Declare @.bitIsEven bit

Set @.intCurrentValue=0

While @.intCurrentValue Is Not Null
Begin
SELECT @.intCurrentValue=Min(TestAssignmentID) FROM TestAssignment
WHERE TestAssignmentID>@.intCurrentValue

If @.intCurrentValue % 2=0
Set @.bitIsEven=1
Else
Set @.bitIsEven=1

End
--
However, the execution time for this loop turns out to be very close to
the execution time for my cursor loop in Scenario E so perhaps I
shouldn't be so afraid of using cursors.

Bill|||(billmiami2@.netscape.net) writes:
> I never use a cursor if I can help it--I've been trained to avoid them.

Good!

> I would normally use a SELECT loop as in

But you didn't learn the lesson!

The reason that you should avoid cursors is that you foremost look for
a set-based solution.

But once you need to iterate, the cursor is probably the best way. Some
of my colleagues appear to prefer a "poor man's cursor" like in your
example. If there is an index on your control column, the difference to
a cursor may not be significant. But if you do this on an indexless
temp table with tens of thousands of rows, the penalty is severe. A cursor
sets up the iteration once, so for a cursor the index does not matter.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||> But you didn't learn the lesson!
> The reason that you should avoid cursors is that you foremost look
for
> a set-based solution.
> But once you need to iterate, the cursor is probably the best way.
Some
> of my colleagues appear to prefer a "poor man's cursor" like in your
> example. If there is an index on your control column, the difference
to
> a cursor may not be significant. But if you do this on an indexless
> temp table with tens of thousands of rows, the penalty is severe. A
cursor
> sets up the iteration once, so for a cursor the index does not
matter.

In fact, I did learn the lesson, Erland and yes, I can see clearly that
the set based solution is the answer if at all possible. The
experiment and this discussion has been very enlightening.

Bill

No comments:

Post a Comment