Friday, February 10, 2012

calculated rows?

I've been waiting for this for a while. I'd like to be able to define a
table by specifying a set of numbers with the option of giving or not giving
bounds. Like, "all the integers between 200 and 500", or "all integers
greater than 2". Or, "all the Sundays between now and next year". There are
ways to do this already, but none of them are particularly elegant or can
flexibly represent unbounded sets. Is there any talk of adding a "calculated
rows" feature to SQL?
Hmmm. Just had an interesting thought- mix MatLab with SQL...
PaulPJ6 wrote:
> I've been waiting for this for a while. I'd like to be able to define a
> table by specifying a set of numbers with the option of giving or not givi
ng
> bounds. Like, "all the integers between 200 and 500", or "all integers
> greater than 2".
Huh? Where do you store an infinity of numbers? Presumably you just
mean a virtual table. In that case surely you would have to disallow
queries like the one you specified, since they will never return a
result:
SELECT * FROM numbers WHERE num >2 ;
-- how many rows does this return? !!!
Which queries are permitted against such tables and which ones aren't?
Given the potentially insoluble complexity of answering that question,
what is the advantage of "unbounded sets" over finite ones? These
requirements are commonly solved using auxiliary tables:
SELECT * FROM numbers
WHERE num >2
AND num <= 1000000 ;
SELECT dt
FROM Calendar
WHERE day_of_w = 'Sunday'
AND dt >= CURRENT_TIMESTAMP
AND dt < '20070101' ;
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||what do you mean by calculated rows?|||> I've been waiting for this for a while. I'd like to be able to define a
> table by specifying a set of numbers with the option of giving or not
> giving bounds. Like, "all the integers between 200 and 500", or "all
> integers greater than 2". Or, "all the Sundays between now and next year".
> There are ways to do this already, but none of them are particularly
> elegant or can flexibly represent unbounded sets. Is there any talk of
> adding a "calculated rows" feature to SQL?
Steve Kass has posted a UDF that will generate a table of sequential
numbers - obviously a finite set. I've found it to be very useful. Search
this NG if you want it.|||"PJ6" <nobody@.nowhere.net> wrote in message
news:%23n5v%23UwXGHA.1348@.TK2MSFTNGP05.phx.gbl...
> I've been waiting for this for a while. I'd like to be able to define a
> table by specifying a set of numbers with the option of giving or not
> giving bounds. Like, "all the integers between 200 and 500", or "all
> integers greater than 2". Or, "all the Sundays between now and next year".
> There are ways to do this already, but none of them are particularly
> elegant or can flexibly represent unbounded sets. Is there any talk of
> adding a "calculated rows" feature to SQL?
> Hmmm. Just had an interesting thought- mix MatLab with SQL...
>
With table valued user-defined functions you could always do this. But only
with the CLR integration does this get interesting.
Here's a TSQL function that enumerates an integer range. You could do
similar things for other types, or just wrap this function and do
conversions.
create function IntegerRange(@.start int, @.end int)
returns @.range table(i int primary key)
as
begin
declare @.i int
set @.i = @.start
while @.i <= @.end
begin
insert @.range(i) values (@.i)
set @.i = @.i + 1
end
return
end
This is OK for small ranges, but is quite expensive for larger ones.
Here's a CLR version that's vastly more efficient.
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None,
FillRowMethodName =
"IntegerRangeCLR_FillRow",
IsPrecise = true,
IsDeterministic = true,
TableDefinition = "i int",
Name = "IntegerRangeCLR")]
public static IEnumerable IntegerRangeCLR_Init(int begin, int end)
{
for (int i = begin; i <= end; i+=1)
{
yield return i;
}
}
public static void IntegerRangeCLR_FillRow(Object obj, out int i)
{
i = (int)obj;
}
};
Although these functions have the divantage that SQL can't gathere
statistics on them, and so a physical table with the enumarated range might
give you better overall query performance.
David|||"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:1144939158.812211.120070@.z34g2000cwc.googlegroups.com...

> Which queries are permitted against such tables and which ones aren't?
A trivial difficulty since SQL is already a set-based language; only a few
new error conditions need to be detected so if the user really does SELECT *
from an infinite set, SQL could simply detect the problem up front and throw
an error such as:
Server: Msg 4208, Level 16, State 1, Line 1
Cannot query virtual tables representing unbounded sets without a completely
bounding WHERE clause.
Certainly there will still be cases where a user can make a rediculous yet
legal query from a virtual table, but that really adds no new danger as
problems like these are already possible, i.e. accidentally generating a
cartesian product in your result set.

> Given the potentially insoluble complexity of answering that question,
> what is the advantage of "unbounded sets" over finite ones?
Unbounded sets are better because the definition of the set is all that's
required - why waste space with real rows? I bet with a little tweaking the
query optimizer could take this new kind of table into account without any
problem, possibly even increasing performance over similar queries that use
traditional tables.
I see from another post that this becomes a little more practical to try
this using the CLR integration in SS2K5. That will be useful... but I would
argue that this should really be something built right into the database
engine; working with vitual sets (which currently must first be
materialized) is a design pattern that sees enough common usage that it's
perfectly reasonable to expect a deidcated feature with optimizations.
Paul
P.S. I know about "sqlwish", just posting to discuss. :)|||PJ6 wrote:
> "David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
> news:1144939158.812211.120070@.z34g2000cwc.googlegroups.com...
>
> A trivial difficulty since SQL is already a set-based language; only a few
> new error conditions need to be detected so if the user really does SELECT
*
> from an infinite set, SQL could simply detect the problem up front and thr
ow
> an error such as:
> Server: Msg 4208, Level 16, State 1, Line 1
> Cannot query virtual tables representing unbounded sets without a complete
ly
> bounding WHERE clause.
Is this really a trivial difficulty? When is an inner join against an
infinite set completely bounded? What about outer joins? What about a
WHERE clause based on a correlated subquery? I doubt it will be
feasible to detect any but the simplest queries that don't logically
terminate. It is certainly impossible to detect all recursive queries
that won't logically terminate (= the Halting Problem). Those queries
must fail at runtime due to stack and memory constraints.
A cartesian product against finite tables doesn't suffer quite the same
issue. The result is always finite and so the query will always
complete, given sufficient time and resources.
Recursive queries do have some of the qualities you are looking for. In
SQL Server 2005, create a view like this:
CREATE VIEW numbers AS
WITH n
AS (
SELECT CAST(0 AS INT) AS num
UNION ALL
SELECT n.num + 1
FROM n
WHERE num < 32767
)
SELECT num
FROM n;
GO
Now you can query it as follows:
SELECT num
FROM numbers
WHERE num<10 OPTION (MAXRECURSION 32767);
You still need the upper bound in the view though because SQL Server
won't rewrite the CTE query to include the num<10 condition. If the
optimiser was smart enough to push the WHERE clause down into the view
then an unbounded view might be a real possibility.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||What would a real world example be of where this would be useful?
I know a numbers table or a calendar table are often used for purposes like
this, and from what I have seen they work quite well. In the case of
calendar tables, they are often used because there is some business logic
attached to different dates, days of the w, etc.
I suppose you could use a complex query attached to a virtual numbers table
and then join to the results, which would do what you are proposing.
However, this will be time consuming and resource intensive (due to the
added logic in creating the result set) and would defeat the purpose for
using a calendar table to begin with.
I can see some use for a virtual numbers table, perhaps with a requirement
that it be bounded on both sides to prevent infinite queries, but when
creating one is so simple and requires so little resources, why worry about
a virtual one? It would certainly be nice to have, I'm just not certain
that it is at all necessary.
"PJ6" <nobody@.nowhere.net> wrote in message
news:%23n5v%23UwXGHA.1348@.TK2MSFTNGP05.phx.gbl...
> I've been waiting for this for a while. I'd like to be able to define a
> table by specifying a set of numbers with the option of giving or not
giving
> bounds. Like, "all the integers between 200 and 500", or "all integers
> greater than 2". Or, "all the Sundays between now and next year". There
are
> ways to do this already, but none of them are particularly elegant or can
> flexibly represent unbounded sets. Is there any talk of adding a
"calculated
> rows" feature to SQL?
> Hmmm. Just had an interesting thought- mix MatLab with SQL...
> Paul
>|||Here's a recursive sequence-generating function that's
very efficient and will generate as many integers as you'll
ever need (up to 2^64 of them). I got the idea from Itzik
Ben-Gan. On my 3 GHz desktop, it generates about
250,000 rows a second (from Query Analyzer with
the "Discard Results" option).
CREATE FUNCTION Sequence(
@.from BIGINT,
@.to BIGINT
) RETURNS TABLE AS RETURN
WITH L0(c) AS (
SELECT 1 UNION ALL SELECT 1
), L1(c) AS (
SELECT 1 FROM L0 AS A, L0 AS B
), L2(c) AS(
SELECT 1 FROM L1 AS A, L1 AS B
), L3(c) AS(
SELECT 1 FROM L2 AS A, L2 AS B
), L4(c) AS(
SELECT 1 FROM L3 AS A, L3 AS B
), L5(c) AS(
SELECT 1 FROM L4 AS A, L4 AS B
), L6(c) AS(
SELECT 1 FROM L5 AS A, L5 AS B
), Nums AS (
SELECT
ROW_NUMBER() OVER (ORDER BY c) AS n
FROM L5
)
SELECT n+@.from-1 AS Nbr
FROM Nums
WHERE n <= @.to-@.from+1
GO
-- Steve Kass
-- Drew University
PJ6 wrote:

>I've been waiting for this for a while. I'd like to be able to define a
>table by specifying a set of numbers with the option of giving or not givin
g
>bounds. Like, "all the integers between 200 and 500", or "all integers
>greater than 2". Or, "all the Sundays between now and next year". There are
>ways to do this already, but none of them are particularly elegant or can
>flexibly represent unbounded sets. Is there any talk of adding a "calculate
d
>rows" feature to SQL?
>Hmmm. Just had an interesting thought- mix MatLab with SQL...
>Paul
>
>|||"Steve Kass" <skass@.drew.edu> wrote in message
news:eB80wf2XGHA.3604@.TK2MSFTNGP02.phx.gbl...
> Here's a recursive sequence-generating function that's
> very efficient and will generate as many integers as you'll
> ever need (up to 2^64 of them). I got the idea from Itzik
> Ben-Gan. On my 3 GHz desktop, it generates about
> 250,000 rows a second (from Query Analyzer with
> the "Discard Results" option).
> CREATE FUNCTION Sequence(
> @.from BIGINT,
> @.to BIGINT
> ) RETURNS TABLE AS RETURN
> WITH L0(c) AS (
> SELECT 1 UNION ALL SELECT 1
> ), L1(c) AS (
> SELECT 1 FROM L0 AS A, L0 AS B
> ), L2(c) AS(
> SELECT 1 FROM L1 AS A, L1 AS B
> ), L3(c) AS(
> SELECT 1 FROM L2 AS A, L2 AS B
> ), L4(c) AS(
> SELECT 1 FROM L3 AS A, L3 AS B
> ), L5(c) AS(
> SELECT 1 FROM L4 AS A, L4 AS B
> ), L6(c) AS(
> SELECT 1 FROM L5 AS A, L5 AS B
> ), Nums AS (
> SELECT
> ROW_NUMBER() OVER (ORDER BY c) AS n
> FROM L5
> )
> SELECT n+@.from-1 AS Nbr
> FROM Nums
> WHERE n <= @.to-@.from+1
> GO
>
Well, I'm not often surprised by SQL. But wow. That's super-.
David

No comments:

Post a Comment