Python代写:CS414Scalica


完善一个叫Scalica的项目,实现要求的新功能,最终需要部署到Google Cloud Engine上。

Goal

The final result we want to achieve is to extend the function of Scalica so
that it can automatically match two users for a dinner date.
In general the project could be divided into three subsections:

  • Requesting and storing user input
  • Analyzing user input and generating a matching rank
  • Presenting matching information to user
    Firstly, we will want to augment the functionality in Scalica. We would like
    to ask the users some questions. There are 20 rating questions such as “Do you
    like spicy food?” “Do you like French food?” each question with a scale from 1
    to 5 (1: dislike – 5: Like a lot). The service randomly chooses 3 questions,
    and ask the user to indicate their preference. These data will be stored in a
    table with userID as the primary key, and each column representing a question
    storing the rating value. All these data are considered to be private data. So
    we will also have to restrict the viewing or changing of them to unauthorized
    users. For example, some pervert who is trying to setup a date for himself and
    another person he fancies will not be able to do that.
    A is responsible for this part (augmenting Scalica (and its data models) to
    store additional info), and he would like to use either Java or Python and
    Django for this part. More on database design: to make the database scalable,
    we will use horizontal partitioning to partition the user info data. Even
    though we only get one physical node for database, we will create some logical
    shards so that it will be easy when the amount of data grow and we need to
    redistribute them across different physical nodes. Each time a new entry is
    created, its ID will contain an identifier marking the logical shard which
    it’s in.
    Secondly, we will want to use these data to match these people. We will be
    using the latent factor model to achieve this. Normally, the latent factor
    model will have a data table that store each user and their preference level
    to multiple labels. In our ca se, this matrix A will be a matrix of users vs.
    all the different labels created by users. These labels are the favourite
    books, users’ jobs, their favourite movies and etc. In the actual matrix, each
    value of the matrix element is the rating we got from first step corresponding
    to (user, label). Since it is unrealistic for users to rate their preference
    level for each of the label in the matrix, there will be many blanks in the
    matrix. We will then compute the latent factor for each user to the other
    labels to substitute these blanks. To compute the latent factors. We would
    need to create two matrix out of the original matrix A. One would be the
    matrix that represents users vs. flavors. The other one would be flavors vs.
    labels. By doing a gradient descent on the product of these two matrices, we
    would be able to get the latent factor matrix. We could then use the latent
    factor matrix to get labels that were not rated by the user but now it is. For
    the completed matrix, each row would be the interpreted preference array for
    the user against the labels. We then find out for each user, whose preference
    array is most alike. We would do this by computing the dot product of the
    difference array between two users by itself. The users with lowest three such
    values will be recommended to each other. Depending on the number of users we
    have and the frequency they choose to update their preference information, we
    will need to update the preference matrix in different windows of time. B will
    be responsible for the second part. For the language used here. He is
    considering either Python or C++. Former for the reason that it is easy to do
    matrix arithmetic with Python. The later been that it might be faster to use
    C++. All the above service will be run on a cloud server from Google’s Cloud
    Engine. We will be using Redis to store the list of suggestions for it is
    better supported than memcached.
    Thirdly, we will have two different database, one is for the user information
    and the other is for the result of survey. Both of them will be horizontal
    partitioned and logical sharded for the expansion in the future. The host of
    matching algorithm will analyze the data from the survey database. At the end
    of every finished calculation, it will update all changes to the suggestion
    list in Redis. The algorithm will run every time when there is a change in
    survey database. C is responsible to construct survey database.
    Optional: We will need to be presenting these informations to the users. We
    will need to guarantee that both user’s personal information is not leaked in
    anyway to the other prior to their meeting. First, we will present the
    suggested restaurant, the price range, and the dining time to both users in
    the match. If both users agree to meet, the system will provide them more
    detailed and personal information, such as interests, favorite food and etc.
    If one of the user in the match decides to not meet, the system will hide all
    the information presented before from both users and suggest a new match to
    both users according to the next higher rank in the system. The third part
    require techniques in visualizing the data. After achieving basic function
    mentioned above, another main focus is beautifying the user interface and the
    data visualization. C is responsible for this part, and he would like to use
    python and Django for this part. The packages I might use are Bokeh for
    visualization with Python.

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录