Big Data News December 7, 2012

  • Podcast Preview of Big Data Analytics Report

    How are organizations approaching big data? What challenges are they experiencing? What are the commonalities in big data projects across industries and geographies? These questions and more are answered in a podcast with a lead researcher on the report “Analytics: The real-world use of big data.”

  • Addressing the Big Data Skills Gap

    Closing the big data talent gap requires tackling the problem from both sides: the people and the technology. Adequately training the data scientists of tomorrow is an obvious and necessary step, but what about the non-data scientists? And what about the technology side? What can we do to make the technology more accessible to the people? If companies are saying that they don’t have the in-house skills to do something with big data, then doesn’t that imply that the existing big data technologies are just too complicated?

  • Beijing Spirit Leads Enterprises to Continuous Progress

    A country needs great national spirit and so does a city. That’s the reason that Beijing, the capital of China announced “Beijing Spirit” on 2nd, November, 2011. Beijing Spirit includes Patriotism, Innovation, Inclusiveness and Virtue. This is the summary of the spiritual wealth formed in the development and practice of Beijingers. It has become a guide to Beijing citizens’ practice since then. As an advanced local enterprise in Beijing, Raqsoft integrates Beijing Spirit into its…

  • Get Rid of Mistaken Thoughts in OLAP

    OLAP is a type of BI software that emerged and gradually developed 20 years ago. OLAP can be used to handle the complex computation flexibly and rapidly according to the requirements of analyzers and present the result to the decision-makers in an intuitive and understandable style. The decision-makers can thus grasp the enterprise operating status accurately, understand the object requirements, and set the right scheme.



    The original intention of OLAP is the arbitrary…

Date Converter: Transform a date into different formats

One very common data transformation is the stadardization of dates. This Date Converter Code can be customized to your need. You can use foreign languages or change the order of day, month and year. Also you could include other standardizations like ’01’ and ‘1’.
[sourcecode language=”python”]# Date Converter
# Write a procedure date_converter which takes two inputs. The first is
# a dictionary and the second a string. The string is a valid date in
# the format month/day/year. The procedure should return
# the date written in the form <day> <name of month> <year>.
# For example , if the
# dictionary is in English,
english = {1:"January", 2:"February", 3:"March", 4:"April", 5:"May",
6:"June", 7:"July", 8:"August", 9:"September",10:"October",
11:"November", 12:"December"}
# then  "5/11/2012" should be converted to "11 May 2012".
# If the dictionary is in Swedish
swedish = {1:"januari", 2:"februari", 3:"mars", 4:"april", 5:"maj",
6:"juni", 7:"juli", 8:"augusti", 9:"september",10:"oktober",
11:"november", 12:"december"}
# then "5/11/2012" should be converted to "11 maj 2012".
# Hint: int(’12’) converts the string ’12’ to the integer 12.
def date_converter(dic, string):
first_split = string.find(‘/’)
month = string[0:first_split]
second_split = string.find(‘/’,first_split+1)
day = string[first_split+1:second_split]
year = string [second_split+1:]

month_name= dic[int(month)]

return day+’ ‘+month_name+’ ‘+year
print date_converter(english, ‘5/11/2012’)
#>>> 11 May 2012
print date_converter(english, ‘5/11/12’)
#>>> 11 May 12
print date_converter(swedish, ‘5/11/2012′)
#>>> 11 maj 2012
print date_converter(swedish, ’12/5/1791’)
#>>> 5 december 1791[/sourcecode]

Reducing text to it’s components

This short phyton programm takes a Webpage as an input and reduces it to it’s components. The components are the words on the webpage. You can use this and customize this to fit your purpose. This code can be applied in web-crawlers, text analytics and other fields. For example if you want do leave out stop words you would define a dictonary of this word and include this with anouther if statement. This could be applied if you want to reduce patent data to it’s components and leave generic terms like ‘a’ ‘this’ ‘innovation’ etc. out. You would do this because words like this have no information value.

[sourcecode language=”python”]

def remove_tags(source):

output = [ ]

atsplit = True

splitlist = [‘ ‘,’>’,'<‘,’n’]

i = 0

while i < len(source):

if source[i] == ‘<‘:

i = source.find(‘>’,i+1)

if source[i] in splitlist:

atsplit = True

else:

if atsplit:

output.append(source[i])

atsplit = False

else:

output[-1] = output[-1] + source[i]

i = i + 1

return output[/sourcecode]

 

Verwandte Artikel:

Programming like Google, Facebook … : Getting the Basics

It is a new eara. Today programming changed radically from functionality to usablity. A customer expects to be served in milliseconds. Also the amount of data is increasing every second. This requires high performance programming. Internet Companies like google faced this issue ealy on. These companies developed tools and methods to overcome these challange. Many of use call this Big Data Programming. 

Before we can understand Big Data Progamming, we need to understand why it was developed. So I recommend that you learn the basics of googles business: building a search engine. This is also a good start for everybody who never programed before. Afterware in a second post I will introduce you to state of the art Big Data Technologies that help us to use this basic principles on a large scale. Keywords are Hadoop, NoSql, Parallel Programming and a Shared Nothing Architecture. 

 

1. Learn how to built your own search engine

Fortunatly there are great resource out there that help you in a very professional manner. I recommend to you the Python course by Udacity that is though by Sebestian Thrun a Stanford Professor and google fellow. The course is online and can be taken for free. Take a look:

 

Now it is your turn, sign up and learn to bulit your own search engine

 

This post will be continued in the next view weeks: Big Data and Web Intelligence. And The Topics Hadoop and Parallel Programming will follow.