Module 6: Query Optimization
This module will cover methodologies for optimizing distributed queryplans.
Apply cost model and heuristics to query treesto find hear optimal query plans.
Discuss the benefits of semi join algorithm.
Chapter Principles of Distributed Database Systems, Ozsuand Valduriez, 2011
1. Describet the conditions under which semi join better choice than traditional join on
Assume databese represents the following tables
CUSTOMER (CNO NAME AGE STATE BALANCE)
ORDER (CNO ORDERNO, DATE, COST QTY)
Calculale de cost ofexeculing the following qualy using de
NAME AGE STATE BALANCE QTY
CUSTOMERCNO =ORDERCNO AND
The following stahstics about the database:
(valursi rangy from 20m 59)
(valurs from $2000)
The CUSTOMER has hashed indes on CNO and has
The ORDER table has alhashed index on CNO
The cost of freading record from table
The cost of witing arecord into: table
The cost of setting uo transfer
The cost of transferring bytei Tra
Assume the semijoin will be implemented as
CUSTOMER ORDER CUSTOMER] X (ORDER CUSTOMER)
de cost join between tables and B dial aue resident on
de samme sile. wlane Lias almahed muex am: lias no midex canle calculatedas
follews Serially read the recests from Band for cach recond determine ifit particinates
the result Tfit does prejert the sired attributes from the meord of l and performa
constant time read for the correspondunz joun record from table AL
1) Assume the following tables and attributes:
site 1: ITEM is located on
There one to-many between CUST and ITEM where cid is the foreign key of
CUST (on in the ITEM table.
CUSThas hashed indexes on cid.
There is NO index on theITEM table.
here are 100 CUSTrecords and 1000ITEMrecords
Tra is the network transfer cost per byte.
Topu the cost of CPU instruction (including disk I/O)
the cost of initiating and receiving message.
There are only twodistinct states in the CUST table records (MD and VA)
Given the query:
name, state, description
CUST.age 50 and
ITEM price? 30and
ITEM price <= 50
See next pages for questions
a) Estimate the total cost of the query by moving the appropriate parts of the CUST table from
site to and performing the join at site To estimate the total cost, sum the cost of the
estimates of each intermediate step. Assume that you are donc when you have completed the
join. (This NOT semi-join.)
b) Estimate the total cost of the query by moving the appropriate parts of the ITEM table from
site 2to and performing the join at site 1. To estimate the total cost. sum the cost of the
estimates of each intermediat step. Assume that you are donc when you have completed the
join. (This NOT a semi-join.)
c) What single modification to the database would enable an even lower cost query plan? a am
not looking for semi-join.)
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction
of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice.
Unethical use is strictly forbidden.
1. Question 1: Describe the conditions under which a semi-join is a better choice than a traditional join on distributed tables.
a. Semi join is used when we want to reduce the number of tuple in a relation before transferring it to another site
i. We want to join 2 tables, S and R, over attribute A. R is stored at site 1 and S is stored at site 2.
ii. We assume that size(S) > size(R)
iii. We note that
R ⋈A S (R ⋉A S) ⋈A S R ⋈A (S ⋉A R) (R ⋉A S) ⋈A (S ⋉A R)
1. Tradition join:
a. We have to move all R from site 1 to site 2. It costs TMSG * TTR*(size(R)) ...