Monday, March 7, 2016

Midterm prep

I felt like I was going into the midterm in Data Mining (this Thursday) a bit blind. I don't feel like I'm behind, I've done all the reading and homework, I just didn't have a good feel for what to expect on the test. I guess I wasn't alone. Today, the prof posted this. Going through all of them between now and Thursday will be a lot, but well worth it I'm sure.

CS 5342 / 4342 Practice Questions
In addition to reviewing all the questions in the homework assignments and the ones worked out in class, attempt the following:
1.      Questions on time complexity of k-means, DBSCAN
2.      Questions on convergence (guaranteed or not) properties of k-means, DBSCAN
3.      Questions on local-vs.-global optimum finding, on the effect of the choice of the initial centroids, on the effect of the order of the points, of k-means
The book by Aggarwal:
Chap 2:  Questions  1, 13
Chap 3:  Questions  1, 5, 15
Tan et al.:

Chap 2: Questions 16, 17, 23 (see below):

16. Consider a document-term matrix, where tf_ij is the frequency of the ith word (term) in the jth document and m is the number of documents. Consider the variable transformation that is defined by

tf/_ij = tf_ij log (m/df_i)

where df_i is the number of documents in which the ith term appears and is known as the document frequency of the term. This transformation is known as the inverse document frequency transformation.

 (a) What is the effect of this transformation if a term occurs in one document? In every document?
 (b) What might be the purpose of this transformation?

17. Assume that we apply a square root transformation to a ratio attribute x to obtain the new attribute x. As part of your analysis, you identify an interval (a, b) in which x has a linear relationship to another attribute y.

(a) What is the corresponding interval in terms of x?
(b) Give an equation that relates y to x.

23. Given a similarity measure with values in the interval [0,1] describe two ways to transform this similarity value into a dissimilarity value in the interval [0,].



Chap 8: Questions 5, 6, 9, 16 (see the pdf file for Chap 8)

No comments:

Post a Comment