CS 5342 /
4342 Practice Questions
In addition
to reviewing all the questions in the homework assignments and the ones worked
out in class, attempt the following:
1. Questions on time complexity of
k-means, DBSCAN
2. Questions on convergence (guaranteed
or not) properties of k-means, DBSCAN
3. Questions on local-vs.-global optimum
finding, on the effect of the choice of the initial centroids, on the effect of
the order of the points, of k-means
The book by Aggarwal:
Chap 2: Questions 1, 13
Chap 3: Questions
1, 5, 15
Tan et al.:
Chap 2:
Questions 16, 17, 23 (see below):
16. Consider a document-term matrix, where tf_ij is the frequency of the ith word (term) in the jth document and m is the number of documents. Consider the
variable transformation that is defined by
tf/_ij = tf_ij ∗ log (m/df_i)
where df_i is the number of
documents in which the ith term appears and is known as the document frequency of the term. This
transformation is known as the inverse document
frequency transformation.
(a) What is the effect of this transformation
if a term occurs in one document? In every document?
(b) What might be the purpose of this
transformation?
17. Assume that we apply a square root transformation to a ratio attribute x to obtain the new attribute x∗. As part of your analysis, you identify an interval (a, b) in which x∗ has a linear
relationship to another attribute y.
(a) What is
the corresponding interval in terms of x?
(b) Give an
equation that relates y to x.
23. Given a similarity measure with values in the interval [0,1] describe
two ways to transform this similarity value into a dissimilarity value in the
interval [0,∞].
Chap 8:
Questions 5, 6, 9, 16 (see the pdf file for Chap 8)
No comments:
Post a Comment