按键盘上方向键 ← 或 → 可快速上下翻页,按键盘上的 Enter 键可回到本书目录页,按键盘上方向键 ↑ 可回到本页顶部!
U T P E R S IS T E N CE 265
The sample output shows that we have the following items to fix:
o There are empty lines of text where no data has been defined。
o Some lines of text have empty fields at the end。
o Some fields have an incorrect date format。
o Some dates have duplicates; which need to be removed。
o Some lines of text have too many fields。 We need to figure out which fields we want to
keep and which we can discard。
■Note When processing streams and cleaning them up; it is important to take the stream apart first and
see what you are up against。 Do not make assumptions until you have looked at the individual pieces of data。
Then you will be able to determine the steps you need to undertake to fix the stream。
Fixing the Stream
The final solution uses the same code used to parse the lines of text and individual fields; as
follows (note; however; that we need the individual lines if the date format is correct; so we
store each one in the lineOfText variable):
…………………………………………………………Page 288……………………………………………………………
266 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE
Public Class LottoTicketProcessor : Implements IProcessor
Private _dates As IList(Of String) = New List(Of String)()
。 。 。
Public Function Process(ByVal input As String) As String
Implements IProcessor。Process
Dim reader As TextReader = New StringReader(input)
Dim retval As StringBuilder = New StringBuilder()
Do While reader。Peek() …1
Dim lineOfText As String = reader。ReadLine()
Dim splitUpText As String() =
lineOfText。Split(New Char() {〃 〃c; ControlChars。Tab})
If _dates。Contains(splitUpText(0)) Then
Continue Do
End If
If splitUpText(0)。Length = 0 Then
Continue Do
End If
If splitUpText(0)。Contains(〃…〃) Then
Dim dateSplit As String() = splitUpText(0)。Split(New Char() {〃…〃c})
Dim newDate As String =
dateSplit(0) & 〃。〃 & dateSplit(1) & 〃。〃 & dateSplit(2)
If _dates。Contains(newDate) Then
Continue Do
End If
For c1 As Integer = 0 To 7
retval。Append(〃 〃 & splitUpText(c1))
End If
Return retval。ToString()
End Function
。 。 。
End Class
■Note In the downloadable source code; the individual steps taken to clean up the data stream are demon
strated。 For reference; the intermediate development steps in the source code are called Process01()
through Process05()。
Let’s review how this code fixes the five problems we discovered。
…………………………………………………………Page 289……………………………………………………………
CH A PT E R 1 0 ■ L E A R N I N G A B O U T P E R S IS T E N CE 267
Empty Lines of Text
The following code removes the empty lines of text。
If splitUpText(0)。Length = 0 Then
Continue Do
End If
When lotto。txt was processed; the output data stream generated a single field array for
an empty line。 So; we know that if the first field element has a length of zero; the line of text
should be ignored。
Empty Fields and Too Many Fields
The next problem in our list is that some lines have empty text fields at the end。 Solving this
problem would probably entail a solution similar to the previous one; but you should think of
the big picture and understand that solving one problem might also solve another problem。 In
this case; solving the problem of the empty fields also helps solve the problem of having too
many fields。
Both of these problems are solved by knowing the data that is being manipulated。 As you’ve
seen; the data stream assumes the following format: date; then lottery numbers 1 to 6; and then
the bonus number。 The parts of the data stream that are not correct have the same format; with
some extra information like replay number and empty fields。 Thus; the fix is to copy the date
and append the remaining fields; as follows:
For c1 As Integer = 0 To 7
retval。Append(〃 〃 & splitUpText(c1))
The first line of code appends the date to the StringBuilder buffer (retval)。 Then in the
For loop that follows; a space and the fields 0 to 7 are copied to the StringBuilder buffer。
Incorrect Data Format
In some of the fields; the date has a period separator; in others; it has a hyphen。 The correct
format is a period; and the code that fixes the date format is as follows:
If splitUpText(0)。Contains(〃…〃) Then
Dim dateSplit As String() = splitUpText(0)。Split(New Char() {〃…〃c})
Dim newDate As String =
dateSplit(0) & 〃。〃 & dateSplit(1) & 〃。〃 & dateSplit(2)
A fix is needed if the first field contains a hyphen。 The If statement tests for this using the
Contains() method。 If a fix is needed; the first field is separated again into three subfields; where
each subfield represents a part of the date (year; month; day)。 Then those three subfields are
rebined and separated using the period and assigned to the variable newDate。
Duplicate Dates
The last problem that needs to be solved is having duplicate dates in the data stream。 The
following code fixes this problem (the duplicate date code is bolded)。
…………………………………………………………Page 290……………………………………………………………
268 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE
If _dates。Contains(splitUpText(0)) Then
Continue Do
End If
If splitUpText(0)。Length = 0 Then
Continue Do
End If
If splitUpText(0)。Contains(〃…〃) Then
Dim dateSplit As String() = splitUpText(0)。Split(New Char() {〃…〃c})
Dim newDate As String = _
dateSplit(0) & 〃。〃 & dateSplit(1) & 〃。〃 & dateSplit(2)
If _dates。Contains(newDate