内容发布更新时间 : 2024/5/20 14:12:15星期一 下面是文章的全部内容请认真阅读。

HW1

Due Date: Oct. 24

Submission requirements:

Please submit your solutions to our class website. Only hand in what is required below.

Upload the Clementine stream containing the assignment execution to our class website so that we may refer to it if necessary. Part I: 书面作业:1-3 见ppt

4. 假定数据仓库中包含4个维:date, product, vendor, location;和两个度量:sale_number和sales_cost。

(a) 画出该数据仓库的星形模式图

(b) 由基本方体[date, product, vendor, location]开始,列出vendor Wal-Mart每年在Los Angles的

所有sales_cost。

(c) 对于数据仓库,位图索引是有用的。以该立方体为例,简略讨论使用位图索引结构的优点和问

题。

5. 下面是一个超市某种商品连续24个月的销售数据(单位为百元)

21,16,19,24,27,23,22,21,20,17,16,20,23,22,18,24,26, 25,20,26,23,21,15,17

(a) 对以上数据进行深度为6的Equal-depth binning,然后分别采用bin median及bin

boundaries两种方法进行平滑。

(b) 请写出采用min-max方法,将16和23规范化到 [0,1] 区间后的结果。

6. Consider the data set shown in Table 1, (min_sup = 40%, min_conf=75%)

(a) (b) (c)

Find all frequent itemsets using Apriori and FP-growth, respectively, by treating each transaction ID as a market basket. Compare the efficiency of the two mining processes. Use the results in part (a) to compute the con?dence for the association rules {a, d}?{e} and {e}?{a, d}. Is con?dence a symmetric measure?

List all of the strong association rules (with support s and confidence c) matching the following metarule, where X is a variable representing customers, and itemi denotes variables representing items (e.g. “A”, “B”, etc.):

Table 1. Example of market basket transactions.

PAGE 1 4/14/2013

Part II: 上机作业:Recommendation Systems

The goal of this assignment is to learn the use of market basket analysis for the purpose of making product purchase recommendations to the customers.

The data set contains transactions from a large supermarket. Each transaction is made by someone holding the loyalty card. We limited the total number of categories in this supermarket data to 20 categories for simplicity. The field value for a certain product in the transaction basket is 1 if the customer has bought it and 0 if he/she has not. The file named “Transactions” has data for 46243 transactions. The data are available from the class web page.

Your written submission should consist only of those deliverables marked indicated by “Hand-in”. Market basket analysis has the objective to discover individual products, or groups of products that tend to occur together in transactions. The knowledge obtained from a market basket analysis can be employed by a business to recognize products frequently sold together in order to determine recommendations and cross-sell and up-sell opportunities. It can also be used to improve the efficiency of a promotional campaign.

Run Apriori on “transaction” data set. Set the “Type” of “COD” as “Typeless”, set the “direction” of all the other 20 categories as “Both”, set their “Type” as “Flag”. Set “Minimum antecedent support” to be 5%, “Minimum confidence” to be 50%, and “Maximum number of antecedents” to be 5 in the modeling node (Apriori node). In general you should explore by trying different values of these parameters to see what type of rules you get.

? Hand-in: The list of association rules generated by the model.

? Sort the rules by lift, support, and confidence, respectively to see the rules identified.

Hand-in: For each case, choose top 5 rules (note: make sure no redundant rules in the 5 rules) and give 2-3 lines comments. Many of the rules will be logically redundant and therefore will have to be eliminated after you think carefully about them.

PAGE 2 4/14/2013