代写基于知识的AI,解决特定的图像识别问题。不需要完成整个Project,需要尽量通过测试。
Goal
The goal of Knowledge-Based AI is to develop human-level, human-like
intelligence. To that end, one way of evaluating the success of KBAI agents is
to have them take human intelligence tests. In this project, you will develop
agents that can address a specific intelligence test. In particular, in this
project, your agents will solve 2x2 visual analogy problems using verbal and
visual representations.
Overview
Design an agent that can answer 2x2 visual analogy problems based on both
verbal and visual representations. You will have access to 24 sample problems
to use in designing your agent: 12 Basic problems and 12 Challenge problems.
After you submit your code, your agent will be run against these 24 problems,
as well as 24 additional problems that you have not seen before: 12 Test
problems and 12 Raven’s problems. Your grade will be based on your agent’s
performance on the 12 Basic and 12 Test problems. The 12 Challenge and 12
Raven’s problems will not impact your grade. You will also write a design
report of roughly 1500 words describing the way your algorithm works, its
relative strengths, and its relative weaknesses.
Getting Started
To get started, download the code package. The code package is available in
two languages: Python (compatible with either Python 2 or Python 3) or Java.
Python is strongly recommended unless you already have expertise in Java. Over
the last few years Python has gained extensive popularity for academic
research (specially AI) and our TA staff has more expertise in this language
so they can be more helpful with suggestions.
Contained in the package are three things: the code, the API, and the sample
problems. Note that the same code package can be used for all three projects
in this course: the only difference is which problems your agent addresses on
each project.
The Problems
The only difference between Projects 1, 2, and 3 is which problems your agent
is run against. In Project 1, your agent is run against Problem Set B – Basic,
Test, Challenge, and Raven’s. Your agent is only graded on its performance on
Basic and Test – Challenge and Raven’s are used for your curiosity.
You are provided with the Basic and Challenge problems to use in designing
your agent. The Test and Raven’s problems are hidden and will only be used
when grading your project. This is to test your agents for generality: it
isn’t hard to design an agent that can answer questions it has already seen,
just as it would not be hard to score well on a test you have already taken
before. However, performing well on problems you and your agent haven’t seen
before is a more reliable test of intelligence.
Your grade is based solely on your agent’s performance on the Basic and Test
problems. The Challenge and Raven’s problems are optional. The Test problems
are written to be directly and closely analogous to the Basic problems. For
example, Test Problem B-03 will use similar transformations and reasoning to
Basic Problem B-03. Thus, in designing your agent, you should rest assured
that it will not be tested on anything radically different from what you have
seen in the Basic problems. Similarly, you can use these direct comparisons to
evaluate your agent’s generality: if your agent correctly answers Basic
Problem B-03 but misses on Test Problem B-03, that suggests its reasoning on
these problems may be relatively brittle. Generally, your agent will first run
on all available Basic problems, then all available Test problems, then all
available Challenge problems, then all available Raven’s problems.
Both the Basic and the Test problems are also closely analogous to the
corresponding Raven’s problem. Basic and Test Problems B-03 borrow the same
reasoning from Raven’s Problem B-03. The Raven’s problems, however, often
involve an additional layer of visual complexity, such as unrecognizable
shapes, textures, or patterns. The Challenge problems are sometimes (but not
always) written specifically to bridge the gap between Basic and Raven’s
problems. Although you are not graded on the Challenge and Raven’s sets, if
your agent actually manages to perform better on these sets than the Basic and
Test sets, we may award some extra points. After all, the ultimate goal is to
perform well on the true Raven’s test.
All problems are contained within the Problems folder of the downloadable.
Problems are divided into sets, and then into individual problems. Each
problem’s folder has three things:
- The problem itself, for your benefit.
- A ProblemData.txt file, containing information about the problem, including its correct answer, its type, and its verbal representation (if applicable).
- Visual representations of each figure, named A.png, B. png, etc.
You should not attempt to access ProblemData.txt directly; its filename will
be changed when we grade projects. Generally, you need not worry about this
directory structure; all problem data will be loaded into the RavensProblem
object passed to your agent’s Solve method, and the filenames for the
different visual representations will be included in their corresponding
RavensFigures.
Working with the Code
The framework code is available here as Project-Code-Java.zip or Project-Code-
Python.zip.
The Code
The downloadable package has a number of either Java or Python files:
RavensProject, ProblemSet, RavensProblem, RavensFigure, RavensObject, and
Agent. Of these, you should only modify the Agent class. You may make changes
to the other classes to test your agent, write debug statements, etc. However,
when we test your code, we will use the original versions of these files as
downloaded here. Do not rely on changes to any class except for Agent to run
your code. In addition to Agent, you may also write your own additional files
and classes for inclusion in your project.
In Agent, you will find two methods: a constructor and a Solve method. The
constructor will be called at the beginning of the program, so you may use
this method to initialize any information necessary before your agent begins
solving problems. After that, Solve will be called on each problem. You should
write the Solve method to return its answer to the given question:
- 2x2 questions have six answer options, so to answer the question, your agent should return an integer from 1 to 6.
- 3x3 questions have eight answer options, so your agent should return an integer from 1 to 8.
- If your agent wants to skip a question, it should return a negative number. Any negative number will be treated as your agent skipping the problem.
You may do all the processing within Solve, or you may write other methods and
classes to help your agent solve the problems.
When running, the program will load questions from the Problems folder. It
will then ask your agent to solve each problem one by one and write the
results to ProblemResults.csv. You may check ProblemResults.csv to see how
well your agent performed. You may also check SetResults.csv to view a summary
of your agent’s performance at the set level.
The API
Included in the downloadable is the API for interacting with the code
(API/index.html in the downloadable). You may use this and the in-line
comments to understand the structure of the problems. Briefly, however:
- RavensProject: The main driver of the project. This file will load the list of problem sets, initialize your agent, then pass the problems to your agent one by one.
- RavensGrader: The grading file for the project. After your agent generates its answers, this file will check the answers and assign a score.
- Agent: The class in which you will define your agent. When you run the project, your Agent will be constructed, and then its Solve method will be called on each RavensProblem. At the end of Solve, your agent should return an integer as the answer for that problem (or a negative number to skip that problem).
- ProblemSet: A list of RavensProblems within a particular set.
- RavensProblem: A single problem, such as the one shown earlier in this document. This is the most complicated and important class in the project, so let’s break it into parts. RavensProblem includes:
- A HashMap (Java) or Dictionary (Python) of the individual Figures (that is, the squares labeled “A”, “B”, “C”, “1”, “2”, etc.) from the problem. The RavensFigures associated with keys “A”, “B”, and “C” are the problem itself, and those associated with the keys “1”, “2”, “3”, “4”, “5”, and “6” are the potential answer choices.
- A String representing the name of the problem and a String representing the type of problem (“2x2” or “3x3”).
- Variables hasVisual and hasVerbal indicating whether that problem has a visual or verbal representation (all problems this semester have visual representations, only some have verbal representations).
- RavensFigure: A single square from the problem, labeled either “A”, “B”, “C”, “1”, “2”, etc. All RavensFigures have a filename referring to the visual representation (in PNG form) of the figure’s contents. Problems with verbal representations also contain dictionaries of RavensObjects. In the example above, the squares labeled “A”, “B”, “C”, “1”, “2”, “3”, “4”, “5”, and “6” would each be separate instances of RavensFigure, each with a list of RavensObject.
- RavensObject: A single object, typically a shape such as a circle or square, within a RavensFigure. For example, in the problem above, the Figure “C” would have one RavensObject, representing the square in the figure. RavensObjects contain a name and a dictionary of attributes. Attributes are key-value pairs, where the key is the name of some general attribute (such as ‘size’, ‘shape’, and ‘fill’) and the value is the particular characteristic for that object (such as ‘large’, ‘circle’, and ‘yes). For example, the square in figure “C” could have three RavensAttributes: shape:square, fill:no, and size:very large. Generally, but not always, the representation will provide the shape, size, and fill attributes for all objects, as well as any other relevant information for the particular problem.
The API is ultimately somewhat straightforward, but it can be complicated when
you’re initially getting used to it. The most important things to remember
are: - Every time Solve is called, your agent is given a single problem. By the end of Solve, it should return an answer as an integer. You don’t need to worry about how the problems are loaded from the files, how the problem sets are organized, or how the results are printed. You need only worry about writing the Solve method, which solves one question as a time.
- RavensProblems have a dictionary of RavensFigures, with each Figure representing one of the image squares in the problem and each key representing its letter (squares in the problem matrix) or number (answer choices). All RavensFigures have filenames so your agent can load the PNG with the visual representation. If the problem has a verbal representation as well (hasVerbal or hasVerbal() is true), then each RavensFigure has a dictionary of RavensObjects, each representing one shape in the Figure (such as a single circle, square, or triangle). Each RavensObject has a dictionary of attributes, such as “size”:”large”, “shape”:”triangle”, and “fill”:”yes”.
Libraries
No external libraries are permitted in Java. In Python, the only permitted
libraries are the latest version of the Python image processing library Pillow
and the latest version of numpy. For installation instructions on Pillow, see
this page. For installation instructions on numpy, see this page. No other
libraries are permitted.
Image Processing
Generally, we do not allow external libraries. For Java, you may use anything
contained within the default Java 8 installation. Java 8 has plenty of image
processing options. We recommend using BufferedImage, and we have included a
bit of sample code below for loading images into BufferedImage. If you have
other suggestions, please bring them up on Piazza!
Python has no native support for image processing, so an external library must
be used. The only external library we support for image processing for Python
is Pillow. You can install pillow simply by running easy_install pillow. More
comprehensive information on installing Pillow can be found here. We have
included a code segment below on loading an image from a file with Pillow.
Your Submission
Submission
For instructions on submitting your agent’s code, see the project submission
instructions.
Project Reflection
In addition to completing your agent, you are also asked to complete a project
reflection of roughly 1500 words. The project reflection serves two purposes:
(a) to help you reflect on and learn from your experience during the project,
and (b) to help communicate your ideas to your peers in the class as well as
the graders.
In your project reflection, you should answer the following questions. You can
separate your project reflection into multiple sections each answering a
question, or you can write a more general project reflection that covers these
questions:
- How does your agent reason over the problems it receives? What is its overall problem-solving process? Did you take any risks in the design of your agent, and did those risks pay off?
- How does your agent actually select an answer to a given problem? What metrics, if any, does it use to evaluate potential answers? Does it select only the exact correct answer, or does it rate answers on a more continuous scale?
- What mistakes does your agent make? Why does it make these mistakes? Could these mistakes be resolved within your agent’s current approach, or are they fundamental problems with the way your agent approaches these problems?
- How does your approach connects (or can be related) to specific KBAI methods discussed in class? (if you agent’s approach is not directly based on KBAI methods, briefly explain how your approach is related to or could potentially be translated to KBAI methods)
- What improvements could you make to your agent given unlimited time and resources? How would you implement those improvements? Would those improvements improve your agent’s accuracy, efficiency, generality, or something else?
- How well does your agent perform across multiple metrics? Accuracy is important, but what about efficiency? What about generality? Are there other metrics or scenarios under which you think your agent’s performance would improve or suffer?
- Which reasoning method did you choose? Are you relying on verbal representations or visual? If you’re using visual input, is your agent processing it into verbal representations for subsequent reasoning, or is it reasoning over the images themselves?
- Finally, what does the design and performance of your agent tell us about human cognition? Does your agent solve these problems like a human does? How is it similar, and how is it different? Has your agent’s performance given you any insights into the way people solve these problems?
As each project builds on the previous one, it is likely that much of your
information will be the same from project to project. In this event, what is
important is to remember the learning goals: this is a reflection, and you are
meant to reflect on the progress since the previous project. When applicable,
it’s fine to give only a short description of your previous project and spend
most of your time focusing on your progress on the new project. Please mention
when you’re doing so, though, so that the graders and your peers know when
you’re intentionally summarizing rather than skipping information.
Your project reflection will be evaluated on a scale of 0 to 40. Each of the
above questions will be evaluated on a scale of 0 to 5 to determine your
score. As with other assignments, a 90% should not be considered the threshold
for an ‘A’ on the project reflection – make sure to check the stats posts when
grades are posted to have context for your grade.
Grading
Your grade will be based on three components:
- Your agent’s score on the 12 problems in Basic Problems B (20% of your grade).
- Your agent’s score on the 12 problems in Test Problems B (40% of your grade).
- Your project reflection (40% of your grade).
In calculating your agent’s score on the Basic and Test problems, your agent
will receive 1 point for each correct response. Your agent will also lose
1/5th of a point for each incorrect response, and lose nothing for each
skipped problem. To skip a problem, your agents should give a negative number
as its answer. This is to correct for the effect of randomness, as well as to
encourage you to equip your agent with some metacognition to decide how
confident it is on a given answer and whether it is worth answering.
In addition to these grading criteria, your agent may be awarded some extra
points if its performance on the Challenge and Raven’s problems outperforms
its performance on the Basic and Test problems. Your agent may also be awarded
some extra points if we regard your agent’s approach as particularly novel and
unique. Check out the Take Chances, Make Mistakes section of the Overall
Project Guidelines for more on this.
Note that many of the problems are very difficult and your agent is certainly
not expected to solve all of them. The current state of the art in this field
still only answers 70% or so of the problems correct. Thus, a 90/100 should
not be considered the threshold for an A. Grades generally will be normalized
after the fact based on the class’s performance as a whole, so make sure to
read the announcements accompanying the grades to understand how to interpret
your grade, and don’t freak out when you see a score that would usually
translate to a C or worse (last semester, the class average on the last
project was a 65). We will also be more generous in our interpretation of
Project 1 since there’s a larger learning curve and the standards for success,
both in this class and for these problems, has not yet been set.
Recovery Credit
The ultimate goal of this project is to design an agent that can perform well
on all 192 problems. Thus, your submissions for each project will run on the
previous projects’ problems as well; Project 2 will run on sets B and C, and
Project 3 will run on all four sets, B, C, D, and E.
Previously, we graded each agent’s performance on all these problems. However,
students in the past have pointed out that penalizes students who did poorly
on Project 1 – their agent is running against the same problems, and so their
grade is already lower than others who did better on Project 1. So, this
semester, we’re revising this so that it can only help you. If your agent
performs better on problem set B in Project 2 than in Project 1, you will
receive half credit back. So, if you get a score of 4 on Basic Problem Set B
in Project 1, and a score of 8 on Basic Problem Set B in Project 2, then your
Project 1 grade will be based on a score of 6 on this set.
For more on this, please see the Repeated Problem Set section of the Overall
Project Guidelines.
Best Projects
At the conclusion of each project, a handful of the best projects will be
selected and, with the students’ permission, posted for public viewing. The
selection of the “best” will be made in large part based on how many problems
each student’s agent gets correct, but it may also be based partially on
subjective analysis by the graders. If a particular project takes a
particularly unique approach, for example, it may be selected as an exemplary
project even if other projects technically performed better.
Tips for Getting Started
This project can be a bit overwhelming at first. Don’t get discouraged! We’ve
heard time and time again students report that they had no clue how they were
going to succeed at first, but by the end of it they had a solid grasp of the
concepts and workflows necessary. Note in the Overall Project Guidelines the
note on authenticity. This project can seem overwhelming to you because it’s a
big, open question facing the AI community today. You’re working on a real
problem.
After two semesters and almost 500 students, a few common tips have started to
emerge for how to get started on the project. These tips aren’t meant to
pigeonhole you for the entire duration of the project, but they’re meant to
help you overcome that initial hump and get something working. You’re free to
ignore these; these are mostly supplied in case you’re having trouble getting
started.
- Instead of trying to design an agent that can answer every problem right from the beginning, try instead to write an agent that can solve one problem. Then, look at why that approach is failing on a second problem, and see if you can tweak it to get that second problem right. Continue that iteration and you’ll come to an idea for a broader plan, but you’ll also have an agent that’s already partially successful.
- Look for common problem types or feasible heuristic approaches. Are there three problems that use similar transformations? Try to focus on those to get the most progress for the time you invest.
- Remember, you’re only graded on Basic and Test problem sets. Test problem sets are written to be very, very closely analogous to the Basic problem set, so don’t let yourself get distracted by the Challenge problems. Those are provided for those that want to really get into the project and prepare for the real Raven’s problems, but they shouldn’t concern you if you’re having trouble.