SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Bypassing the keyword DISTINCT


Bypassing the keyword DISTINCT

Author
Message
Van Heghe Eddy
Van Heghe Eddy
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1007 Visits: 905
Hello,

I have been told that the keyword DISTINCT is rather costly for sqlserver and was wondering with witch statement we could bypass this keyword and still become the resultset

consider this:

CREATE TABLE [dbo].[tbl_1](
[tbl1_id] [int] NOT NULL,
[field1] [varchar](50) COLLATE Latin1_General_BIN NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tbl_2](
[tbl2_id] [int] NOT NULL,
[tbl1_id] [int] NOT NULL,
[tbl2_actief] [bit] NOT NULL CONSTRAINT [DF_tbl_2_tbl2_actief] DEFAULT ((1))
) ON [PRIMARY]

INSERT INTO [tbl_1] (
[tbl1_id],
[field1]
)
SELECT 1,'A'
UNION ALL
SELECT 2,'B'
UNION ALL
SELECT 3,'C'



INSERT INTO [tbl_2] (
[tbl2_id],
[tbl1_id],
[tbl2_actief]
)
SELECT 1,1,1
UNION ALL
SELECT 2,1,1
UNION ALL
SELECT 3,2,0


SELECT DISTINCT [t].[tbl1_id], [field1] FROM [tbl_1] AS t INNER JOIN [tbl_2] AS t2 ON t.[tbl1_id] = t2.[tbl1_id] WHERE [tbl2_actief] = 1

shows me only:
tbl1_id field1
1 A

This works for me because it shows me al fields from table1 for witch there is a relation in table 2 and this relation is active in table2
Because i need only all fields from table 1 and none from table 2 the distinct does what it should for me

But can we make another select statement that does not need the distinct en still gives me the same results back .?

PS: is it really really true that the distinct keyword is very expensive.? or am i asking a stupid question here?

wkr,
Eddy
Jerry Hung
Jerry Hung
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1818 Visits: 1208
so you want only UNIQUE Field1 from Table1? is that the requirement?
and it's okay to have duplicate entries in Table2?

GROUP BY or ROW_NUMBER can mimic DISTINCT as well

SQLServerNewbie

MCITP: Database Administrator SQL Server 2005
Van Heghe Eddy
Van Heghe Eddy
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1007 Visits: 905
That is idd whats the demand,

DISTINCT works fine, don't understand me wrong, but cause i have been told its more expensive i'm looking for a more faster query method if possible
because my resultsets in production will contain several thousands of rows and will be called multiple times a minute during peak hours.

So i have just created my full statement for a view, and for now with the test data is runs lightning fast
Queryplan seems fine to
but still i want to be proactive for when we will reach production numbers.

Total View select statement looks like:

SELECT b.BRL_ID, b.PAR_ID_FIRMA, b.OFF_ID, b.PAR_ID_TYPE, b.PAR_ID_DISC,
b.ADR_ID_KLANT,b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.PAR_ID_EENHEID, b.PAR_ID_MUNT,
b.BRL_BASISNR, b.BRL_BEGDATUM, b.BRL_ENDDATUM, b.BRL_REMOVED,
b.BRL_REGDATUM, dbo.ADRES.ADR_NAAM AS KLANTNAAM, ADRES_1.ADR_NAAM AS LAADPLAATS,
ADRES_2.ADR_NAAM AS LOSPLAATS, dbo.PLAATS.PLS_CODE AS PLAATSCODELADEN, dbo.PLAATS.PLS_OMSCHR AS PLAATSLADEN,
PLAATS_1.PLS_CODE AS PLAATSCODELOSSEN, PLAATS_1.PLS_OMSCHR AS PLAATSLOSSEN, dbo.ADRSRT.ADS_NR AS KLANTNR,
dbo.MASTERREL.PAR_ID_TYPE AS PAR_ID_TYPE_MASTER, dbo.ARTIKEL.ART_OMS_NED AS ARTIKELNAAM
FROM


(SELECT DISTINCT
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b INNER JOIN
dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE (d.DLR_REMOVED = 0)) AS b

INNER JOIN
dbo.MASTERREL ON b.MRL_ID = dbo.MASTERREL.MRL_ID INNER JOIN
dbo.ARTIKEL ON b.ART_ID = dbo.ARTIKEL.ART_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_2 ON b.ADR_ID_LOSNR = ADRES_2.ADR_ID LEFT OUTER JOIN
dbo.ADRES AS ADRES_1 ON b.ADR_ID_LAADNR = ADRES_1.ADR_ID LEFT OUTER JOIN
dbo.PLAATS AS PLAATS_1 ON b.PLS_ID_LOSSEN = PLAATS_1.PLS_ID LEFT OUTER JOIN
dbo.PLAATS ON b.PLS_ID_LADEN = dbo.PLAATS.PLS_ID LEFT OUTER JOIN
dbo.ADRES ON b.ADR_ID_KLANT = dbo.ADRES.ADR_ID LEFT OUTER JOIN
dbo.ADRSRT ON dbo.ADRES.ADR_ID = dbo.ADRSRT.ADR_ID AND dbo.ADRSRT.ADS_TYPE = 1


If any expert see's something that could make this "always" run fast no mather how many thousand records there are in Basisrel then please tell me how to improve this one.

Wkr,
Eddy
ChrisM@Work
ChrisM@Work
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21568 Visits: 19693
How many rows are returned by this...
SELECT DISTINCT 
                      
b.BRL_ID, b.OFF_ID, b.MRL_ID, b.PAR_ID_FIRMA, b.PAR_ID_TYPE, b.PAR_ID_DISC, b.BRL_REGDATUM, b.BRL_BEGDATUM, b.BRL_ENDDATUM,
                      
b.BRL_REMOVED, b.BRL_REMOVED_LISA, b.BRL_BASISNR, b.ADR_ID_KLANT, b.ARG_ID, b.ART_ID, b.ADR_ID_LAADNR, b.ADR_ID_LOSNR,
                      
b.PAR_ID_EENHEID, b.PAR_ID_MUNT, b.PLS_ID_LADEN, b.PLS_ID_LOSSEN, b.BRL_FACTLIJST, b.BRL_OPM_FACTLIJST, b.BRL_OPM_FACTUUR,
                      
b.BRL_OPM_NED_LAAPLAATS, b.BRL_OPM_FRA_LAADPLAATS, b.BRL_OPM_NED_LOSPLAATS, b.BRL_OPM_FRA_LOSPLAATS,
                      
b.BRL_OPL_EIGENDOM
FROM dbo.BASISREL AS b
INNER JOIN dbo.DEELREL AS d ON b.BRL_ID = d.BRL_ID
WHERE d.DLR_REMOVED = 0



...with and without the DISTINCT?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Van Heghe Eddy
Van Heghe Eddy
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1007 Visits: 905
in my testtable
i have 4 rows when run without the distinct and 3 rows if i use the distinct keyword
what is exactly what it should return cause i have put 2x DEELREL With the same BASISREL.BRL_ID in it.

But we need to create a selection view for our planning team and they do not need the DEELREL Data at that point, thats why i want only the "Distinct" BASISREL returned here

In our dataschema there is ALWAYS AT LEAST 1 DEELREL for each BASISREL,

Wkr,
Eddy
ChrisM@Work
ChrisM@Work
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21568 Visits: 19693
How many rows in DEELREL and BASISREL?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Garadin
Garadin
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3661 Visits: 4107
Can you make this join 1 to 1 ? b.BRL_ID = d.BRL_ID

Where did you read that DISTINCT had a high cost associated with it?

Seth Phelabaum
Consistency is only a virtue if you're not a screwup. ;-)

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
GilaMonster
GilaMonster
SSC Guru
SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)

Group: General Forum Members
Points: 114395 Visits: 45494
Sorts are expensive operations and distinct requires a sort, as do most other methods of removing duplicates from a resuuklt set.
The best 'workaround' is to find out why there are duplicated (bad data, bad database design, bad query) and fix that. If you don't need distinct, don't use it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Garadin
Garadin
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3661 Visits: 4107
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.

Seth Phelabaum
Consistency is only a virtue if you're not a screwup. ;-)

Links: How to Post Sample Data :: Running Totals :: Tally Table :: Cross Tabs/Pivots :: String Concatenation
GilaMonster
GilaMonster
SSC Guru
SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)SSC Guru (114K reputation)

Group: General Forum Members
Points: 114395 Visits: 45494
Garadin (11/20/2008)
Makes sense. That said, I've read that in certain circumstances, such as IN clauses, it's usually beneficial to add DISTINCT, even if you shouldn't really need it. Not sure if it's accurate as I've never done any testing of it. It should help if it eliminates duplicates in an IN clause though.


It's not only not beneficial, it can be harmful.

Duplicates in an IN are ignored. All that's important to the IN is what values exist, not how many times the value repeats. The optimiser typically uses a semi-join on an in with a subquery where only the presence of the row is important, the join isn't actually done.

Most of the time the optimiser ignores distinct in IN, because it's irrelevant. Sometimes it does run distinct and it's a waste of resources.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search