用C语言实现一个文件分析程序,根据每日的温度记录进行统计分析。
Learning Outcomes
In this project you will demonstrate your understanding of loops and if
statements by writing a program that sequentially processes a file of text
data. You are also expected to make use of functions (Chapters 5 and 6) and
arrays (Chapter 7, and covered in lectures in Weeks 6 and 7). The sample
solution that will be provided to you will also make use of structures
(Chapter 8), and you may do likewise if you wish. But there is no requirement
for you to make use of struct types.
Sequential Data
Vast numbers of scientific and engineering datasets are stored in text files
using comma separated values format, usually with a one-line header describing
the contents of the columns. A key requirement is to be able to process this
data, looking for trends, patterns, and insights.
The dataset used in this project was generated by the Bureau of Meteorology,
and was accessed 27 March 2017 from
http://www.bom.gov.au/climate/data/index.shtml
(dataset IDCJAC0010, station
86282 “Melbourne Airport”), and then edited to make the test files provided on
the LMS. The editing included removing some of the columns, replacing 19
missing values by -999, and then extracting three different subsets into three
test files: data00031.txt, Melbourne temperature data for one month, March
1990; data00365.txt, Melbourne temperature data for one year, 1971; and
data16802.txt, Melbourne temperature data for 45 years, 1971 to 2016
inclusive. The first few lines and last two lines of the file data00031.txt
are
Product code,BoM station,Year,Month,Day,Maximum (C),Minimum (C)
IDCJAC0010,86282,1990,3,1,24.9,18
IDCJAC0010,86282,1990,3,2,30.2,15.7
IDCJAC0010,86282,1990,3,3,28.2,17.2
IDCJAC0010,86282,1990,3,4,28.6,18
IDCJAC0010,86282,1990,3,5,25.2,17.5
[etc]
IDCJAC0010,86282,1990,3,30,19.2,-999
IDCJAC0010,86282,1990,3,31,19.8,9
where, as already noted, -999 indicates missing entries (perhaps equipment
failure, or similar issues).
Stage 1 - Control of Reading and Printing (marks up to 4/10)
Your program should read the entire input dataset into a collection of
parallel arrays (or, if you are adventurous, an array of struct), counting the
items as it goes. The first line of the input file should be discarded without
being retained. When the entire dataset is in memory, your program should
print the first and last of the input records. The output this stage produces
when given data00031.txt is shown by this interaction:
Stage 1
——
Input has 31 records
First record in data file:
date: 01/03/1990
min : 18.0 degrees C
max : 24.9 degrees C
Last record in data file:
date: 31/03/1990
min : 9.0 degrees C
max : 19.8 degrees C
To read the comma-separated data lines from the rest of the input file, you
should use this recipe:
scanf(“IDCJAC0010,%d,%d,%d,%d,%lf,%lf\n”, &location, &yy, &mm, &dd, &max, &min)
—|—
That is, you may assume that the “Product code” value is fixed, but not the
“BoM station”. You will need a separate while(getchar())
loop to consume
the first line.
You may (and should) assume that at most 50,000 days will be covered by the
input data. Notice how the output formatting of the first record is the same
as the output formatting for the last one. You need to be thinking about
functions at every opportunity.
Stage 2 - Computing Stuff (marks up to 6/10)
Of course, the goal is to try and compute average temperatures, and see if
they have changed over the years. In this stage your program should accumulate
the average minimum temperature for each year represented in the input file,
and the average maximum. Note that due to recording errors, some temperatures
show as the value -999. These values should be ignored when computing the
average, as the corresponding years were each a little shorter.
For example, for the file data00365.txt, the required output is just two
lines:
Stage 2
——
1971: average min: 9.37 degrees C (365 days)
average max: 19.45 degrees C (365 days)
because all of the records in that file fall into a single year, 1971 (William
McMahon was Prime Minister of Australia, Richard Nixon was President of the
United States, Leonid Brezhnev ruled the Soviet Union [and the Cold War was
active], Mao Zedong was paramount Leader of China [and no foreigners could
enter the country at all], Edward Heath was Prime Minister of England, and
Alistair was a middle-school student). Multi-line output for the larger file
data16802.txt is given on the assignment FAQ page.
Wherever appropriate, code should be shared between the stages through the use
of functions. In particular, there shouldn’t be long (or even short) stretches
of repeated or similar code appearing in different places in your program.
You may assume that the data records are presented in strictly increasing date
order, and that you are not required to sort them. You must not assume that
there will be any particular year range in the input data, and must not assume
that the months and dates will be exhaustive (there may be missing days,
missing months and maybe even whole missing years).
Stage 3 - Make A Picture (marks up to 8/10)
Modify your program so that it also generates a “by the month” horizontal
graph with (always) twelve rows showing the range between average observed
minimum and average observed maximum temperatures, where the averages are
computed over every line in the input file that corresponds to each of the
months. Leave a row blank if there are no readings for that month in the input
data file.
These numbers are the long-term average monthly minimum and the long-term
average monthly maximum. For example, on the input file data00365.txt, this
graph should be generated, where the numbers show how many min and max
observations went into each of the average temperatures that are plotted:
Stage 3
——
Jan ( 31, 31) | ************************
Feb ( 28, 28) | ************************
Mar ( 31, 31) | ************************
Apr ( 30, 30) | *********************
May ( 31, 31) | ******************
Jun ( 30, 30) | **************
Jul ( 31, 31) | ******************
Aug ( 31, 31) | ********************
Sep ( 30, 30) | ********************
Oct ( 31, 31) | **********************
Nov ( 30, 30) | **********************
Dec ( 31, 31) | ***************************
+———+———+———+———+———+———+
0 5 10 15 20 25 30
Further examples showing the full output that is required for the three
different test files are provided on the LMS. You should also make your own
test files, by editing out different subsets of the data that is provided,
and/or creating them by hand.
Stage 4 - Climate Science In Action (marks up to 10/10)
Now for some climate science. Suppose we suspect that there is a general trend
for temperatures to be rising with time. To get evidence of that, we decide to
count the number of months in each year in which the average minimum for that
month is greater than the long-term average minimum for the same month, and
for which the average maximum for that month is greater than the long-term
average maximum. That is, each year gets given a score of between zero
(meaning, every month that year both the average minimum and average maximum
were below the corresponding long-term averages) and 24 (meaning, every month
that year both the average minimum and average maximum were above the
corresponding long-term averages). Of course, most years will be somewhere in
between these extremes.
Modify your program so that it prints out the score associated with each of
the first five years and each of the last five years in the period covered by
the data file (or fewer years, if there are less than ten years in total
covered). It only really makes sense to apply this computation to the biggest
data file, data16802.txt), and for this stage you may assume that each year
that is represented in the input will have data for all twelve months (so that
the scores are always out of 24). You should also ensure (as a minimum
requirement) that your program does not generate a runtime error on the other
two data files that are provided. Here is the required output for the file
data16802.txt:
Stage 4
——
1971: score is 8/24
1972: score is 14/24
1973: score is 10/24
1974: score is 7/24
1975: score is 13/24
–
2012: score is 14/24
2013: score is 17/24
2014: score is 22/24
2015: score is 14/24
2016: score is 18/24
If there was no upward or downward trend in the data, these scores could be
expected to average around 12 out of 24, as typical fluctuations around the
mean. What this data suggests that Melbourne is hotter now than it was in the
1970s. Still sceptical about climate change? Ready to convince a politician?