Date Converter: Transform a date into different formats

One very common data transformation is the stadardization of dates. This Date Converter Code can be customized to your need. You can use foreign languages or change the order of day, month and year. Also you could include other standardizations like ’01’ and ‘1’.
[sourcecode language=”python”]# Date Converter
# Write a procedure date_converter which takes two inputs. The first is
# a dictionary and the second a string. The string is a valid date in
# the format month/day/year. The procedure should return
# the date written in the form <day> <name of month> <year>.
# For example , if the
# dictionary is in English,
english = {1:"January", 2:"February", 3:"March", 4:"April", 5:"May",
6:"June", 7:"July", 8:"August", 9:"September",10:"October",
11:"November", 12:"December"}
# then  "5/11/2012" should be converted to "11 May 2012".
# If the dictionary is in Swedish
swedish = {1:"januari", 2:"februari", 3:"mars", 4:"april", 5:"maj",
6:"juni", 7:"juli", 8:"augusti", 9:"september",10:"oktober",
11:"november", 12:"december"}
# then "5/11/2012" should be converted to "11 maj 2012".
# Hint: int(’12’) converts the string ’12’ to the integer 12.
def date_converter(dic, string):
first_split = string.find(‘/’)
month = string[0:first_split]
second_split = string.find(‘/’,first_split+1)
day = string[first_split+1:second_split]
year = string [second_split+1:]

month_name= dic[int(month)]

return day+’ ‘+month_name+’ ‘+year
print date_converter(english, ‘5/11/2012’)
#>>> 11 May 2012
print date_converter(english, ‘5/11/12’)
#>>> 11 May 12
print date_converter(swedish, ‘5/11/2012′)
#>>> 11 maj 2012
print date_converter(swedish, ’12/5/1791’)
#>>> 5 december 1791[/sourcecode]

Reducing text to it’s components

This short phyton programm takes a Webpage as an input and reduces it to it’s components. The components are the words on the webpage. You can use this and customize this to fit your purpose. This code can be applied in web-crawlers, text analytics and other fields. For example if you want do leave out stop words you would define a dictonary of this word and include this with anouther if statement. This could be applied if you want to reduce patent data to it’s components and leave generic terms like ‘a’ ‘this’ ‘innovation’ etc. out. You would do this because words like this have no information value.

[sourcecode language=”python”]

def remove_tags(source):

output = [ ]

atsplit = True

splitlist = [‘ ‘,’>’,'<‘,’n’]

i = 0

while i < len(source):

if source[i] == ‘<‘:

i = source.find(‘>’,i+1)

if source[i] in splitlist:

atsplit = True

else:

if atsplit:

output.append(source[i])

atsplit = False

else:

output[-1] = output[-1] + source[i]

i = i + 1

return output[/sourcecode]

 

Verwandte Artikel: