按键盘上方向键 ← 或 → 可快速上下翻页,按键盘上的 Enter 键可回到本书目录页,按键盘上方向键 ↑ 可回到本页顶部!
————未阅读完?加入书签已便下次继续阅读!
text file is loaded using another text editor; such as Vim; the text is displayed as shown in
Figure 10…4。 As you can see; Vim has loaded the text file without any formatting errors。
■Note Vim is available from http://vim。org。 It is a vi…derived clone that can be used on
Windows systems。
Figure 10…4。 Vim loads the text file in a nicely formatted display。
The real pressing problem lies in the structure of the data; which is illustrated in Figure 10…5。
Here; the data has new formatting; with extra columns; and the first column is not always in
the proper data format。 And to make matters worse; the badly formatted data has repeating
information。
The challenge of the application is to read the stream and fix all of the problems。 This requires
a thorough understanding of string processing and the different ways that text can be stored;
as discussed in Chapter 3。 When you are processing data streams; you need to be aware of the
format of the data stream。 In this example; we are processing ASCII text; and thus will be manipu
lating bits according to the rules of the ASCII lookup table。
Whitespace characters are special characters in the text lookup table。 They are associated
with numbers; but their representation is in the form of an action that the user can see。 For
example; the character between single quotation marks (' ') is a space; the character t is a tab;
and the character n is a newline。 The reason Notepad does not format the lottery text file nicely
(Figure 10…3) is because of the whitespace characters used to indicate a newline。 In Figure 10…6; the
highlighted buffer entry 0A is the hexadecimal character that indicates a linefeed; or newline; in
the lottery text file。
…………………………………………………………Page 284……………………………………………………………
262 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE
Figure 10…5。 Structural problems of this data stream
Figure 10…6。 Newline character used in lotto。txt
Figure 10…7 is a file created by Notepad。 Notepad expects not a single whitespace character;
but two whitespace characters to indicate a newline: 0D and 0A。
…………………………………………………………Page 285……………………………………………………………
CH A PT E R 1 0 ■ L E A R N I N G A B O U T P E R S IS T E N CE 263
Figure 10…7。 Newline characters used by Notepad
Deciphering the Format
The echo has served its purpose of providing a way to develop an application in a top…down
manner。 The next step is to remove the echo code and start writing the code that will fix the
data stream。
Fixing the data stream is not a trivial undertaking; because you are yet again faced with a
state problem。 You don’t want to fix one part of the stream; only to end up with a problem in
another part of the stream。 Thus; you need to incrementally fix the stream and make sure at
each step that there are no ramifications。
The first step is to break the data stream into individual fields (each value in a column is a
field in this case)。 In Figure 10…5; the data stream had two parts; where the upper part seemed
to have a single space between the numbers and the lower part had the amount of space neces
sary to align the numbers。 The difference between the upper and lower parts is the whitespace
characters used。 So; the first step will be to clean up the whitespace。
The following is the code that reads the buffer; splits it up; and reassembles the content
into a new buffer。 The code is intermediate code that adds special bracket markers to indicate
what the text contains。
Imports System。IO
Imports System。Text
' TODO: Fix up this class
Public Class LottoTicketProcessor : Implements IProcessor
Public Function Process(ByVal input As String) As String
Implements IProcessor。Process
Dim reader As TextReader = New StringReader(input)
Dim retval As New StringBuilder()
…………………………………………………………Page 286……………………………………………………………
264 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE
Do While reader。Peek() …1
Dim splitUpText As String() = _
reader。ReadLine。Split(New Char() {〃 〃c; ControlChars。Tab})
Dim c1 As Integer
For c1 = 0 To splitUpText。Length 1
retval。Append((〃(〃 & splitUpText(c1) & 〃)〃))
Next
retval。Append(ControlChars。NewLine)
Loop
Return retval。ToString()
End Function
End Class
In the implementation of Process(); the text will be parsed line by line。 Then each line
is split into the individual fields。 You could write the parsing routines yourself; but to parse a
buffer line by line; it is more efficient to use StringReader。 StringReader accepts the string to
parse and is then assigned to a TextReader interface instance。
As each line of text is parsed; the most efficient approach to building a buffer is to use
StringBuilder。 You could keep appending data to the string; but if you do that too often the
application’s performance will suffer。
The String type is an immutable type; which means once an object is initialized; you
cannot change the state of the object。 The advantage of immutable types is that they increase
the speed of your application; because code can assume once an object has been assigned; it
will never change。 The downside is that once an object is assigned; to modify the object state
even slightly; you must instantiate a new object; which would be the case if we used the = and
± operators。 The StringBuilder type is like String; except the referenced text can be modified。
In the Process() implementation; the Do While loop calls the method Peek(); which reads;
but does not remove; a character value from the stream。 If there is nothing more to read; a …1
value is returned。 Otherwise; data is available; and the method ReadLine() can be called。
ReadLine() will read a buffer of characters until a newline or return character is encountered。
Having read a line of text; it is split into the individual fields using the Split() method。 The split
characters are the space and tab character (ControlChars。Tab)。
When the Split() method returns; the individual fields are assigned to the array splitUpText。
Those array elements are iterated and appended to the StringBuilder variable retval; but each
element is surrounded by a set of brackets。 The brackets provide a set of boundaries that you
can inspect to see what data has been found。 I include the brackets purely for debugging purposes。
Because I am trying to reformat the stream; I append a newline character (ControlChars。NewLine) to
the variable retval。
When all of the lines of text and fields within the lines of text are iterated; a string represen
tation of the StringBuilder instance is returned using the ToString() method。 Running the
code shows how many fields each line of text has and how you should format the text file。 This
gives you an understanding of how the file is structured。
The following is sample output from the lotto。txt file。
…………………………………………………………Page 287……………………………………………………………
CH A PT E R 1 0 ■ L E A R N I N G A B O U T P E R S IS T E N CE 265
(2000。01。1