Lenteur de l'ETC sur la jointure gauche

Question

Lenteur de l'ETC sur la jointure gauche

Demandé el 18 de Juillet, 2019: Quand la question a-t-elle été
997 affichage: Nombre de visites la question a
2 Réponses: Nombre de réponses aux questions
Résolu: Situation réelle de la question

J'ai besoin d'un rapport qui montre tous les utilisateurs d'un tableau et leurs scores. Tous les utilisateurs de cette table n'auront pas de score, donc dans ma solution je calcule d'abord le score en utilisant quelques CTE, puis dans un dernier CTE je tire une liste complète et assigne un score par défaut aux utilisateurs sans score réel.

Si les CTE ne sont pas excessivement complexes, ils ne sont pas simples non plus. Lorsque j'exécute séparément la partie calcul de l'ETC pour les utilisateurs ayant un score réel, le temps d'exécution est inférieur à une seconde. Lorsque je passe à l'ETC final qui saisit l'ensemble de la liste et attribue des scores par défaut là où apparaissent les scores nuls (pas de score réel), les roues s'arrêtent complètement et l'ETC ne s'achève jamais.

J'ai essayé de modifier les index et de les actualiser, mais en vain. J'ai remarqué que la jointure à l'agent_effectiveness, lorsqu'elle est commutée en INNER, s'exécute en une seconde, mais j'ai besoin qu'il s'agisse d'une jointure LEFT afin d'obtenir l'ensemble de la liste même si aucun score n'est présent.

EDIT*

Plan d'exécution Jointure interne

Plan d'exécution Jointure à gauche

WITH agent_split_stats AS ( 
Select
    racf,
    agent_stats.SkillGroupSkillTargetID,
    aht_target.EnterpriseName,
    aht_target.target,
    Sum(agent_stats.CallsHandled) as n_calls_handled,
    CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
        (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
    AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
-- GET TARGETS
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
  ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
  AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
-- GET RACF 
INNER JOIN tblAgentMetricCrosswalk as xwalk
  ON xwalk.SkillTargetID = agent_stats.SkillTargetID
--GET TAU DATA LIKE START DATE AND GRADUATED FLAG
INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
        dbo.fn_WorkDate15(TauStart) 
    END
And Graduated = 'No'
--WPE FILTERS TO ENSURE ACCURATE DATA
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
-- calculate weights
-- one row = one advocate + split
SELECT 
    agent_split_stats.*,
    agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS (
-- calculate the raw Effectiveness score for each eligible advocate/split
-- one row = one agent + split, with their raw Effectiveness score and the components of that
SELECT 
    agent_split_stats_with_weight.*,
    -- these are the components of the Effectiveness score
    (((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / agent_split_stats_with_weight.total_handle_time)*100)*agent_split_stats_with_weight.weight AS effectiveness_sum
FROM agent_split_stats_with_weight
), -- this is where we show effectiveness per split  select * from agent_split_effectiveness
agent_effectiveness AS (
-- sum all of the individual effectiveness raw scores for each agent to get each agent's raw score
SELECT 
    racf AS SaRacf,
    ROUND(SUM(effectiveness_sum),2) AS WpeScore
FROM agent_split_effectiveness
GROUP BY racf
),
--GET FULL CLASS LIST, TAU DATES, GOALS FOR WHOLE CLASS
tau AS (
Select L.SaRacf, TauStart, Goal as WpeGoal 
,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
FROM tblTauClassList AS L
LEFT JOIN agent_effectiveness
  ON agent_effectiveness.SaRacf = L.SaRacf
LEFT JOIN tblCrosswalkTauGoal AS G
  ON G.Year = TauYear
  AND G.Bucket = 'Wpe'
WHERE TermDate IS NULL
AND Graduated = 'No'
)
SELECT tau.*,
CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) > 14 --MUST BE AT LEAST 15 DAYS TO PASS
        AND WpeScore >= WpeGoal THEN 'Pass'
    ELSE 'Fail' END 
from tau

Ce type de requête fonctionne parfaitement dans 3 autres types de calculs différents (différents types de scores). Je ne sais donc pas pourquoi il fonctionne si mal ici. Les résultats réels devraient être une liste d'individus, une date, un score, un but et un score. Si aucun score n'existe, un score par défaut est fourni. De plus, il y a une métrique réussite/échec qui utilise le score/objectif.

Demandé el 18 de Juillet, 2019 par cdscivic

Answer 1

2 Réponses

Answer 2

2voto

Alan Burstein Points 21

Comme @Habo l'a mentionné, nous avons besoin du plan d'exécution réel (par exemple, exécutez la requête avec "include actual execution plan" activé.) J'ai regardé ce que vous avez posté et il n'y a rien qui explique le problème. La différence entre le plan réel et le plan estimé est que le nombre réel de lignes extraites est enregistré, ce qui est essentiel pour dépanner les requêtes peu performantes.

Cela dit, les deux requêtes posent un énorme problème. C'est un problème qui, une fois corrigé, améliorera les deux requêtes à moins d'une seconde. Votre requête utilise deux fonctions scalaires définies par l'utilisateur (UDF) : dbo.fn_WorkDaysAge et dbo.fn_WorkDate15. Les UDF scalaires ruinent tout. Non seulement elles sont lentes, mais elles imposent un plan d'exécution en série qui ralentit considérablement toutes les requêtes dans lesquelles elles sont utilisées.

Je n'ai pas le code pour dbo.fn_WorkDaysAge ou dbo.fn_WorkDate15. J'ai ma propre fonction "WorkDays" qui est en ligne (code ci-dessous). La syntaxe est un peu différente, mais les avantages en termes de performances en valent la peine. Voici la différence de syntaxe :

-- Scalar 
SELECT d.*, workDays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
FROM   <sometable> AS d;

-- Inline version
SELECT d.*, f.workDays
FROM   <sometable> AS d
CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f;

Voici un test de performance que j'ai réalisé pour montrer la différence entre une version inline et une version scalaire :

-- SAMPLE DATA
IF OBJECT_ID('tempdb..#dates') IS NOT NULL DROP TABLE #dates;

WITH E1(x)  AS (SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS x(x)),
     E3(x)  AS (SELECT 1 FROM E1 a, E1 b, E1 c),
     iTally AS (SELECT N=ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM E3 a, E3 b)
SELECT TOP (100000) 
  StartDate = CAST(DATEADD(DAY,-ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE),
  EndDate   = CAST(DATEADD(DAY,+ABS(CHECKSUM(NEWID())%1000),GETDATE()) AS DATE)
INTO #dates
FROM iTally;

-- PERFORMANCE TESTS
PRINT CHAR(10)+'Scalar Version (always serial):'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
  SELECT @workdays = dbo.countWorkDays_scalar(d.StartDate,d.EndDate)
  FROM   #dates AS d;
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

PRINT CHAR(10)+'Inline Version:'+CHAR(10)+REPLICATE('-',60);
GO
DECLARE @st DATETIME = GETDATE(), @workdays INT;
  SELECT @workdays = f.workDays
  FROM   #dates AS d
  CROSS APPLY dbo.countWorkDays(d.StartDate,d.EndDate) AS f
PRINT DATEDIFF(MS,@st,GETDATE());
GO 3

Résultats :

Scalar Version (always serial):
------------------------------------------------------------
Beginning execution loop
380
363
350
Batch execution completed 3 times.

Inline Version:
------------------------------------------------------------
Beginning execution loop
47
47
46
Batch execution completed 3 times.

Comme vous pouvez le constater, la version en ligne est environ 8 fois plus rapide que la version scalaire. Remplacer ces UDFs scalaires par une version en ligne accélérera très certainement cette requête, quel que soit le type de jointure.

Les autres problèmes que je rencontre sont les suivants :

Je vois beaucoup de balayages d'index, c'est un signe que vous avez besoin de plus de filtrage et/ou de meilleurs index.
dbo.tblCrosswalkWghtPhnEffTarget n'a pas d'index, ce qui signifie qu'elle sera toujours analysée.

Fonctions utilisées pour le test de performance :

-- INLINE VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays') IS NOT NULL DROP FUNCTION dbo.countWorkDays;
GO
CREATE FUNCTION dbo.countWorkDays (@startDate DATETIME, @endDate DATETIME) 
/*****************************************************************************************
[Purpose]:
 Calculates the number of business days between two dates (Mon-Fri) and excluded weekends.
 dates.countWorkDays does not take holidays into considerations; for this you would need a 
 seperate "holiday table" to perform an antijoin against.

 The idea is based on the solution in this article:
   https://www.sqlservercentral.com/Forums/Topic153606.aspx?PageIndex=16

[Author]:
 Alan Burstein

[Compatibility]:
 SQL Server 2005+

[Syntax]:
--===== Autonomous
 SELECT f.workDays
 FROM   dates.countWorkDays(@startdate, @enddate) AS f;

--===== Against a table using APPLY
 SELECT t.col1, t.col2, f.workDays
 FROM dbo.someTable t
 CROSS APPLY dates.countWorkDays(t.col1, t.col2) AS f;

[Parameters]:
  @startDate = datetime; first date to compare
  @endDate   = datetime; date to compare @startDate to

[Returns]:
 Inline Table Valued Function returns:
 workDays = int; number of work days between @startdate and @enddate

[Dependencies]:
 N/A

[Developer Notes]:
 1. NULL when either input parameter is NULL, 

 2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
    inline table valued function (iTVF) but performs the same task as a scalar valued user
    defined function (UDF); the difference is that it requires the APPLY table operator
    to accept column values as a parameter. For more about "inline" scalar UDFs see this
    article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
    and for more about how to use APPLY see the this article by SQL MVP Paul White:
    http://www.sqlservercentral.com/articles/APPLY/69953/.

    Note the above syntax example and usage examples below to better understand how to
    use the function. Although the function is slightly more complicated to use than a
    scalar UDF it will yield notably better performance for many reasons. For example,
    unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
    not restrict the query optimizer's ability generate a parallel query execution plan.

 3. dates.countWorkDays requires that @enddate be equal to or later than @startDate. Otherwise
    a NULL is returned.

 4. dates.countWorkDays is NOT deterministic. For more deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

[Examples]:
 --===== 1. Basic Use
 SELECT f.workDays 
 FROM   dates.countWorkDays('20180608', '20180611') AS f;

---------------------------------------------------------------------------------------
[Revision History]: 
 Rev 00 - 20180625 - Initial Creation - Alan Burstein
*****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT workDays =
    -- If @startDate or @endDate are NULL then rerturn a NULL
  CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
               -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
    -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
    -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
    -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
  END;
GO    

-- SCALAR VERSION
----------------------------------------------------------------------------------------------
IF OBJECT_ID('dbo.countWorkDays_scalar') IS NOT NULL DROP FUNCTION dbo.countWorkDays_scalar;
GO
CREATE FUNCTION dbo.countWorkDays_scalar (@startDate DATETIME, @endDate DATETIME) 
RETURNS INT WITH SCHEMABINDING AS
BEGIN
  RETURN
  (
    SELECT workDays =
        -- If @startDate or @endDate are NULL then rerturn a NULL
      CASE WHEN SIGN(DATEDIFF(dd, @startDate, @endDate)) > -1 THEN
                    (DATEDIFF(dd, @startDate, @endDate) + 1) --total days including weekends
                   -(DATEDIFF(wk, @startDate, @endDate) * 2) --Subtact 2 days for each full weekend
        -- Subtract 1 when startDate is Sunday and Substract 1 when endDate is Sunday: 
        -(CASE WHEN DATENAME(dw, @startDate) = 'Sunday'   THEN 1 ELSE 0 END)
        -(CASE WHEN DATENAME(dw, @endDate)   = 'Saturday' THEN 1 ELSE 0 END)
      END
  );
END
GO

MISE À JOUR BASÉE SUR LA QUESTION DE L'OP DANS LES COMMENTAIRES :

D'abord pour la version de chaque fonction valorisée par un tableau en ligne. Notez que j'utilise mes propres tableaux et que je n'ai pas le temps de faire correspondre les noms à votre environnement, mais j'ai fait de mon mieux pour inclure des commentaires dans le code. Notez également que si, dans votre fonction, workingday = '1' est simplement d'extraire les jours de la semaine, alors vous trouverez que ma fonction ci-dessus est une alternative beaucoup plus rapide à votre fonction dbo.fn_WorkDaysAge. Si la fonction dbo.fn_WorkDaysAge workingday = '1' filtre également les vacances, il ne fonctionnera pas.

CREATE FUNCTION dbo.fn_WorkDaysAge_itvf
(
 @first_date  DATETIME,
 @second_date DATETIME
)
RETURNS TABLE AS RETURN
SELECT  WorkDays = COUNT(*)
FROM    dbo.dimdate -- DateDimension
WHERE   DateValue   -- [date]
BETWEEN @first_date AND @second_date
AND     IsWeekend = 0 --workingday = '1'
GO

CREATE FUNCTION dbo.fn_WorkDate15_itvf
(
 @TauStartDate DATETIME
)
RETURNS TABLE AS RETURN
WITH DATES AS 
(
  SELECT 
  ROW_NUMBER() OVER(Order By DateValue Desc) as RowNum, DateValue
  FROM dbo.dimdate -- DateDimension
  WHERE DateValue BETWEEN @TauStartDate AND --GETDATE() testing below 
   CASE WHEN GETDATE() < @TauStartDate + 200 THEN GETDATE() ELSE @TauStartDate + 200 END
  AND IsWeekend = 0 --workingday = '1'
)
--Get the 15th businessday from the current date
SELECT DateValue
FROM  DATES
WHERE RowNum = 16;
GO

Maintenant, pour remplacer vos UDF scalaires par les fonctions en ligne évaluées par des tables, vous devez faire ceci (notez mes commentaires) :

WITH agent_split_stats AS ( 
Select
    racf,
    agent_stats.SkillGroupSkillTargetID,
    aht_target.EnterpriseName,
    aht_target.target,
    Sum(agent_stats.CallsHandled) as n_calls_handled,
    CASE WHEN (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) = 0 THEN 1 ELSE
        (Sum(agent_stats.TalkInTime) + Sum(agent_stats.IncomingCallsOnHoldTime) + Sum(agent_stats.WorkReadyTime)) END
    AS total_handle_time
from tblAceyusAgntSklGrp as agent_stats
INNER JOIN tblCrosswalkWghtPhnEffTarget as aht_target
  ON aht_target.SgId = agent_stats.SkillGroupSkillTargetID
  AND agent_stats.DateTime BETWEEN aht_target.StartDt and aht_target.EndDt
INNER JOIN tblAgentMetricCrosswalk as xwalk
  ON xwalk.SkillTargetID = agent_stats.SkillTargetID
INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15
-- NEW WHERE CLAUSE:
WHERE       agent_stats.DateTime >= 
              CASE WHEN wd.workdays < 15 THEN TauStart ELSE w15.workdays END
And Graduated = 'No'
AND CallsHandled <> 0
AND Target is not null
Group By
racf, agent_stats.SkillGroupSkillTargetID, aht_target.EnterpriseName, aht_target.target
),
agent_split_stats_with_weight AS (
SELECT 
    agent_split_stats.*,
    agent_split_stats.n_calls_handled/SUM(agent_split_stats.n_calls_handled) OVER(PARTITION BY agent_split_stats.racf) AS [weight]
FROM agent_split_stats
),
agent_split_effectiveness AS 
(
  SELECT 
      agent_split_stats_with_weight.*,
      (((agent_split_stats_with_weight.target * agent_split_stats_with_weight.n_calls_handled) / 
         agent_split_stats_with_weight.total_handle_time)*100)*
         agent_split_stats_with_weight.weight AS effectiveness_sum
  FROM agent_split_stats_with_weight
),
agent_effectiveness AS
(
  SELECT 
      racf AS SaRacf,
      ROUND(SUM(effectiveness_sum),2) AS WpeScore
  FROM agent_split_effectiveness
  GROUP BY racf
),
tau AS
(
  SELECT L.SaRacf, TauStart, Goal as WpeGoal 
  ,CASE WHEN agent_effectiveness.WpeScore IS NULL THEN 1 ELSE WpeScore END as WpeScore
  FROM tblTauClassList AS L
  LEFT JOIN agent_effectiveness
    ON agent_effectiveness.SaRacf = L.SaRacf
  LEFT JOIN tblCrosswalkTauGoal AS G
    ON  G.Year   = TauYear
    AND G.Bucket = 'Wpe'
  WHERE TermDate IS NULL
  AND   Graduated = 'No'
)
SELECT tau.*,
-- NEW CASE STATEMENT HERE: 
CASE WHEN wd.workdays > 14 AND WpeScore >= WpeGoal THEN 'Pass' ELSE 'Fail' END 
from tau
-- INLINE FUNCTIONS HERE:
CROSS APPLY dbo.fn_WorkDaysAge_itvf(TauStart, GETDATE()) AS wd
CROSS APPLY dbo.fn_WorkDate15_itvf(TauStart)             AS w15;

Notez que je ne peux pas tester cela pour le moment, mais cela devrait être correct (ou presque).

Répondu el 18 de Juillet, 2019 par Alan Burstein (21 Points )

Answer 3

1voto

cdscivic Points 75

MISE À JOUR

J'ai accepté la réponse d'Alan et j'ai fini par faire ce qui suit. J'ai posté des exemples en espérant que le formatage aidera quelqu'un, cela m'a un peu ralenti... ou peut-être que je suis juste lent heh heh.

1. Changement de mon UDF scalaire en InlineTVF

SCALAR Fonction 1-

    ALTER FUNCTION [dbo].[fn_WorkDaysAge]
(
    -- Add the parameters for the function here
    @first_date DATETIME,
    @second_date DATETIME
)
RETURNS int
AS
BEGIN
    -- Declare the return variable here
    DECLARE @WorkDays int

    -- Add the T-SQL statements to compute the return value here
SELECT @WorkDays = COUNT(*)
FROM DateDimension
WHERE Date BETWEEN @first_date AND @second_date
AND workingday = '1' 

    -- Return the result of the function
    RETURN @WorkDays

END

Fonction de l'iTVF 1-

    ALTER FUNCTION [dbo].[fn_iTVF_WorkDaysAge] 
(   
    -- Add the parameters for the function here
 @FirstDate as Date, 
 @SecondDate as Date
)
RETURNS TABLE  AS RETURN 

SELECT WorkDays = COUNT(*)
FROM DateDimension
WHERE Date BETWEEN @FirstDate AND @SecondDate
AND workingday = '1'

J'ai ensuite mis à jour ma fonction suivante de la même manière. J'ai ajouté le CROSS APPLY (quelque chose que je n'ai personnellement pas utilisé, je suis encore un débutant) comme indiqué ci-dessous et j'ai remplacé les UDF par les noms de champs dans ma déclaration de cas.

Ancien code

INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN dbo.fn_WorkDaysAge(TauStart, GETDATE()) <15 THEN TauStart ELSE
        dbo.fn_WorkDate15(TauStart) 
    END

Nouveau code

INNER JOIN tblTauClassList AS T
  ON T.SaRacf = racf
--iTVFs
CROSS APPLY dbo.fn_iTVF_WorkDaysAge(TauStart, GETDATE()) as age
CROSS APPLY dbo.fn_iTVF_WorkDate_15(TauStart) as roll
WHERE
--FILTERS BY A ROLLING 15 BUSINESS DAYS UNLESS THE DAYS BETWEEN CURRENT DATE AND TAU START DATE ARE <15
agent_stats.DateTime >=
    CASE WHEN age.WorkDays <15 THEN TauStart ELSE
        roll.Date 
    END

Le nouveau code s'exécute en 3-4 secondes. Je vais revenir en arrière et indexer les tables appropriées selon vos recommandations et je gagnerai probablement en efficacité à ce niveau.

Je ne saurais trop vous remercier !

Répondu el 19 de Juillet, 2019 par cdscivic (75 Points )

Lenteur de l'ETC sur la jointure gauche

Réponses

Questions en vedette

Top Tags

Prograide.com

Powered by:

Lenteur de l'ETC sur la jointure gauche

Réponses

Questions en vedette

Top Tags

Dans notre réseau

Prograide.com

Powered by: