Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Bypassing the keyword DISTINCT Expand / Collapse
Author
Message
Posted Thursday, November 20, 2008 6:49 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, June 30, 2014 8:31 AM
Points: 647, Visits: 760
Hello,

I have been told that the keyword DISTINCT is rather costly for sqlserver and was wondering with witch statement we could bypass this keyword and still become the resultset

consider this:

CREATE TABLE [dbo].[tbl_1](
[tbl1_id] [int] NOT NULL,
[field1] [varchar](50) COLLATE Latin1_General_BIN NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tbl_2](
[tbl2_id] [int] NOT NULL,
[tbl1_id] [int] NOT NULL,
[tbl2_actief] [bit] NOT NULL CONSTRAINT [DF_tbl_2_tbl2_actief] DEFAULT ((1))
) ON [PRIMARY]

INSERT INTO [tbl_1] (
[tbl1_id],
[field1]
)
SELECT 1,'A'
UNION ALL
SELECT 2,'B'
UNION ALL
SELECT 3,'C'



INSERT INTO [tbl_2] (
[tbl2_id],
[tbl1_id],
[tbl2_actief]
)
SELECT 1,1,1
UNION ALL
SELECT 2,1,1
UNION ALL
SELECT 3,2,0


SELECT DISTINCT [t].[tbl1_id], [field1] FROM [tbl_1] AS t INNER JOIN [tbl_2] AS t2 ON t.[tbl1_id] = t2.[tbl1_id] WHERE [tbl2_actief] = 1

shows me only:
tbl1_id field1
1 A

This works for me because it shows me al fields from table1 for witch there is a relation in table 2 and this relation is active in table2
Because i need only all fields from table 1 and none from table 2 the distinct does what it should for me

But can we make another select statement that does not need the distinct en still gives me the same results back .?

PS: is it really really true that the distinct keyword is very expensive.? or am i asking a stupid question here?

wkr,
Eddy
Post #605768
Posted Thursday, November 20, 2008 7:38 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Yesterday @ 8:36 AM
Points: 772, Visits: 1,183
so you want only UNIQUE Field1 from Table1? is that the requirement?
and it's okay to have duplicate entries in Table2?

GROUP BY or ROW_NUMBER can mimic DISTINCT as well


SQLServerNewbie

MCITP: Database Administrator SQL Server 2005
Post #605816
Posted Thursday, November 20, 2008 8:11 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, June 30, 2014 8:31 AM
Points: 647, Visits: 760
That is idd whats the demand,

DISTINCT works fine, don't understand me wrong, but cause i have been told its more expensive i'm looking for a more faster query method if possible
because my resultsets in production will contain several thousands of rows and will be called multiple times a minute during peak hours.

So i have just created my full statement for a view, and for now with the test data is runs lightning fast
Queryplan seems fine to
but still i want to be proactive for when we will reach production numbers.

Total View select statement looks like:

SELECT b.BRL_ID, b.PAR_ID_FIRMA, b.OFF_ID, b.PAR_ID_TYPE, b.PAR_ID_DISC,
b.ADR_ID_KLANT,b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.PAR_ID_EENHEID, b.PAR_ID_MUNT,
b.BRL_BASISNR, b.BRL_BEGDATUM, b.BRL_ENDDATUM, b.BRL_REMOVED,
b.BRL_REGDATUM, dbo.ADRES.ADR_NAAM AS KLANTNAAM, ADRES_1.ADR_NAAM AS LAADPLAATS,
ADRES_2.ADR_NAAM AS LOSPLAATS, dbo.PLAATS.PLS_CODE AS PLAATSCODELADEN, dbo.PLAATS.PLS_OMSCHR AS PLAATSLADEN,
PLAATS_1.PLS_CODE AS PLAATSCODELOSSEN, PLAATS_1.PLS_OMSCHR AS PLAATSLOSSEN, dbo.ADRSRT.ADS_NR AS KLANTNR,
dbo.MASTERREL.PAR_ID_TYPE AS PAR_ID_TYPE_MASTER, dbo.ARTIKEL.ART_OMS_NED AS ARTIKELNAAM
FROM


(SELECT DISTINCT
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b INNER JOIN
dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE (d.DLR_REMOVED = 0)) AS b

INNER JOIN
dbo.MASTERREL ON b.MRL_ID = dbo.MASTERREL.MRL_ID INNER JOIN
dbo.ARTIKEL ON b.ART_ID = dbo.ARTIKEL.ART_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_2 ON b.ADR_ID_LOSNR = ADRES_2.ADR_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_1 ON b.ADR_ID_LAADNR = ADRES_1.ADR_ID LEFT OUTER JOIN
dbo.PLAATS AS PLAATS_1 ON b.PLS_ID_LOSSEN = PLAATS_1.PLS_ID LEFT OUTER JOIN
dbo.PLAATS ON b.PLS_ID_LADEN = dbo.PLAATS.PLS_ID LEFT OUTER JOIN
dbo.ADRES ON b.ADR_ID_KLANT = dbo.ADRES.ADR_ID LEFT OUTER JOIN
dbo.ADRSRT ON dbo.ADRES.ADR_ID = dbo.ADRSRT.ADR_ID AND dbo.ADRSRT.ADS_TYPE = 1


If any expert see's something that could make this "always" run fast no mather how many thousand records there are in Basisrel then please tell me how to improve this one.

Wkr,
Eddy

Post #605849
Posted Thursday, November 20, 2008 8:33 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:56 AM
Points: 7,120, Visits: 13,491
How many rows are returned by this...
SELECT DISTINCT 
                      
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
                      
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
                      
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
                      
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
                      
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b
INNER JOIN dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE d.DLR_REMOVED = 0


...with and without the DISTINCT?


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #605866
Posted Thursday, November 20, 2008 9:27 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, June 30, 2014 8:31 AM
Points: 647, Visits: 760
in my testtable
i have 4 rows when run without the distinct and 3 rows if i use the distinct keyword
what is exactly what it should return cause i have put 2x DEELREL With the same BASISREL.BRL_ID in it.

But we need to create a selection view for our planning team and they do not need the DEELREL Data at that point, thats why i want only the "Distinct" BASISREL returned here

In our dataschema there is ALWAYS AT LEAST 1 DEELREL for each BASISREL,

Wkr,
Eddy
Post #605913
Posted Thursday, November 20, 2008 9:34 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:56 AM
Points: 7,120, Visits: 13,491
How many rows in DEELREL and BASISREL?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #605919
Posted Thursday, November 20, 2008 9:47 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 22, 2014 9:38 AM
Points: 1,525, Visits: 4,071
Can you make this join 1 to 1 ? b.BRL_ID = d.BRL_ID

Where did you read that DISTINCT had a high cost associated with it?


Seth Phelabaum
Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
Post #605943
Posted Thursday, November 20, 2008 10:27 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Yesterday @ 2:44 PM
Points: 42,443, Visits: 35,498
Sorts are expensive operations and distinct requires a sort, as do most other methods of removing duplicates from a resuuklt set.
The best 'workaround' is to find out why there are duplicated (bad data, bad database design, bad query) and fix that. If you don't need distinct, don't use it.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #605977
Posted Thursday, November 20, 2008 10:40 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, April 22, 2014 9:38 AM
Points: 1,525, Visits: 4,071
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.

Seth Phelabaum
Consistency is only a virtue if you're not a screwup.

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
Post #605989
Posted Thursday, November 20, 2008 10:46 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Yesterday @ 2:44 PM
Points: 42,443, Visits: 35,498
Garadin (11/20/2008)
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.


It's not only not beneficial, it can be harmful.

Duplicates in an IN are ignored. All that's important to the IN is what values exist, not how many times the value repeats. The optimiser typically uses a semi-join on an in with a subquery where only the presence of the row is important, the join isn't actually done.

Most of the time the optimiser ignores distinct in IN, because it's irrelevant. Sometimes it does run distinct and it's a waste of resources.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #605991
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse