代写验证码识别器,识别数字类型的验证码。
CAPTCHA Explained
Your task in this assignment to write a C program which automatically
recognizes the digits in a CAPTCHA image.
A CAPTCHA is an attempt to determine whether or not a user is human. It is in
effect a reverse Turing test. CAPTCHA are designed to be difficult to
recognize with a computer - the design and assessment of this assignment
recognizes this difficulty.
The input to your program will be a black-and-white (monochrome) image in a
simple format described below. Each image will contain either 1 or 4 digits.
The output of your program should be the digits in the images
Image Format
Common image formats such as JPEG and PNG are complex, and decoding them would
too difficult a task for this assignment.
Instead this assignment uses portable bitmap format (PBM) for images. This is
very simple ASCII format. The first line of each file will contain the
characters “P1” identifying its format. The next line will contain 2 integers,
the width and height of the images. The remainder of the lines in the file
contain ‘1’s and ‘0’s specifying the pixel values of the image. Here is an
example
Here is read_pbm.c which contains a function to read PBM files:
int read_pbm(char filename[], int height, int width, int pixels[height][width]);
—|—
It is strongly recommended you use read_pbm in your assignment rather than
writing your own code.
Part 1 - Digit Cracking
The first part of this assignment is to write a C program crack_digit.c.
crack_digit will be given one command line argument, an image filename.
The image will contain a single digit.
You program should print only a single line of output
This line of output should contain only the digit in the images.
For example:
$ ./crack_digit digit/3_42.pbm
$ ./crack_digit digit/7_99.pbm
$ ./crack_digit digit/0_12.pbm
A dataset of 1000 example digit images is available to help you develop your
program.
The week 7 lab exercises take you through getting started on crack_digit.c.
Part 2 - CAPTCHA Cracking
The second part of this assignment is to write a C program crack_captcha.c.
crack_captcha will be given one command line argument, an image filename.
The image will contain 4 digits.
You program should print only a single line of output
This line of output should contain only the 4 digit in the images.
For example:
$ ./crack_captcha captcha/4224.pbm
$ ./crack_captcha captcha/9264.pbm
$ ./crack_captcha captcha/0053.pbm
A dataset of 1000 example captcha images is available to help you develop your
program.
Challenge CAPTCHA Cracking
The challenge part of this assignment is to identify more difficult captcha
images with crack_captcha.c.
A dataset of 1000 example challenge captcha images is available to help you
develop your program.
Testing
The script ~cs1511/bin/captcha_test will automatically test your programs on a
random subset of a specified size of the supplied images:
$ ~cs1511/bin/captcha_test –digit -n 10 crack_digit.c captcha.h other_C_files
dcc crack_digit.c read_pbm.c -o crack_digit
dcc crack_digit.c read_pbm.c –valgrind -o crack_digit-valgrind
Running 10 tests
Test digit/5_95.pbm passed
…
$ ~cs1511/bin/captcha_test –captcha -n 20 crack_captcha.c captcha.h other_C_files
dcc crack_captcha.c read_pbm.c –valgrind -o crack_captcha-valgrind
Running 20 tests
Test captcha/8119.pbm passed
…
$ ~cs1511/bin/captcha_test –challenge -n 30 crack_captcha.c captcha.h other_C_files
cc crack_captcha.c read_pbm.c -o crack_captcha
dcc crack_captcha.c read_pbm.c –valgrind -o crack_captcha-valgrind
Running 30 tests
Test captcha_challenge/1936.pbm passed
…
Hints
You should follow discussion about the assignment in the class forums.
Questions about the assignment should be posted there so all students can see
the answer.
Don’t panic!
Don’t expect digit or capture identification to be perfect, just identify as
many images as possible correctly.
The week 7 lab exercises showed you how to use one attribute (horizontal
balance) to separate some images giving you a program that recognizes 20% of
digits. Check out the sample solutions when they are released immediately
after the lab is due.
Think about other digit attributes you might calculate.
Here are some possibilities that aren’t too hard to calculate:
Attributes | Description |
---|---|
Tallness | height/width of the bounding box |
Density | fraction of pixels in the bounding box that are black |
Vertical balance | vertical equivalent of horizontal balance |
Holes | number of holes in the image |
Hole Fraction | area of white pixels in holes as fraction of bounding box |
There are more possibilities (and no right way to approach this). | |
If you invent attributes try to make them not depend on the size of the digit | |
(scale invariant). |
Assumptions
You may assume digit images are 70 pixels high and 50 pixels wide.
You may assume captcha and challenge captcha images are 70 pixels high and 200
pixels wide.
You can assume there is one and only one digit in the digit images and only 4
digits in the captcha images.
You can assume digits are roughly vertically oriented, in other words the
right-way up more-or-less.
You can assume digits are a similar size to the digits in the supplied test
images.
You can not assume that the digits do not touch the edge of the image.
Otherwise, make as few assumptions as you can about the images. In particular,
you should try not to make assumptions about the exact pattern of pixels used
for a particular digit.
The images used to test your programs will be different to the images you have
been supplied.
You can however assume there will be no major difference in the depiction of
digits. The test images give a reasonable indication of the type of variation
in the depiction of digits that your program should handle.
Submission of Work
You are required to submit intermediate versions of your assignment.
Every time you work on the assignment and make some progress you should copy
your work to your CSE account and submit it using the give command below.
It is fine if intermediate versions do not compile or otherwise fail
submission tests.
Only the final submitted version of your assignment will be marked.
This will allow you to retrieve earlier versions of your code if needed.
You submit your work like this:
give cs1511 ass1 crack_digit.c crack_captcha.c captcha.h other files
You may submit other .c or .h files.
When crack_digit.c and crack_captcha.c are compiled all other submitted C
files will be compiled with them.
This will allow you define functions for use in both crack_digit.c and
crack_captcha.c.
It is fine if these file contain functions used only by one of crack_digit.c
or crack_captcha.c. Unused functions will not affect the compilation of the
other program.
Only crack_digit.c and crack_captcha.c should contain main functions.
No other function name should be used twice.
You do not need to submit read_pbm.c. It will be automatically compiled with
your programs. It is strongly recommended you use read_pbm.c unchanged. If you
ignore this advice, do not create functions with the same names as functions
in read_pbm.c
Blogging
You must blog every time you work on this assignment, recording how much time
you spent working on the assignment and what this time was spent doing
(reading, designing, coding, testing, debugging, …).
You must blog about all significant bugs in your assignment including what
test found the bug, how the bug was tracked down and fixed, how long this took
and any lessons learnt.
You may create one big blog post and edit each time, or multiple small blog
posts for the assignment.