首页 > 数据分析 > 新手数据科学家的10个工具

[悬赏]新手数据科学家的10个工具 (已翻译8%)

查看 (238次)
英文原文:10 Tools for the Novice Data Scientist
标签: 数据分析
admin 发布于 2017-04-25 10:56:23 (共 12 段, 本文赏金: 21元)
参与翻译(1人): cyt5969858 默认 | 原文

【待悬赏】 赏金: 2元
Data scientists harness their knowledge of statistics in converting collected data into potential ideas for product development, customer retention, and generation of business opportunities. It could even help dissertation writing service with their work. Recently, it was dubbed as the sexiest job in the 21st century as demands for data scientists are increasing. In order to be one, you have to gain the necessary skills to enter the world of data science. And when you do, here are some tools you can use to practice on:


【待悬赏】 赏金: 2元

RapidMiner

It began in 2006 as an open-source program under the name of Rapid-I. As the years went by, they dubbed it as, RapidMiner, and managed to get 35 Million US dollars in funding.  For old versions, the tool, considered open source, is the latest version. It can be ordered within a 14-day trial and the license can be bought after that. RapidMiner takes up the whole life-cycle prediction modeling, and also deployment and validation. The graphic user interface is designed using a block-diagram approach, same as Matlab Simulink’s.



【待悬赏】 赏金: 2元

BigML

This is one more platform that provides a great Graphic User Interface, which can be used in 6 easy steps:

Sources – Makes use of various sources of data
Datasets – Utilize the defined sources to create a new dataset
Models – Creation of predictive models
Predictions – Generates predictions according to the model itself
Ensembles – Develop ensemble of different models
Evaluation – Put up model against other validation sets

The bigml platform can give users enticing visualizations for the product results and has astounding algorithms used to solve regression, clustering, classification, and other association discovery issues.



【待悬赏】 赏金: 2元

DataRobot

DataRobot is a high-end machine learning platform developed by Owen Zhang, Thoman DeGodoy, and Jeremy Achin. This platform is said to have made data scientists almost obsolete.

It is obvious from the quotation in their website that states “Data Science requires math and stats aptitude, programming skills, and business knowledge. With DataRobot, you bring the business knowledge and data, and our cutting-edge automation takes care of the rest.” 

“DataRobot proclaims that they have the platform that it can automatically detect the most efficient feature engineering and information pre-processing using text mining, imputation, encoding, scaling, and transformation."



【待悬赏】 赏金: 3元

Paxata

It comprises one of the companies focusing on data preparation and cleaning. It does not focus on the statistical modeling or machine learning the part. It looks like a Microsoft Excel app, but it is much easier to use.

A visual guide is included in the program, making it easier to collate together data, search and fix missing and dirty data. Also being able to share and recycle data projects amongst groups. Like any other tool that was mentioned, it takes away the need for scripting or coding.

As an outcome, it is very effective in overcoming some technical obstacles revolving around taking care of data. Paxata also has a set of process to follow which includes the Clean and Change which performs data-cleaning utilizing processes like normalizing of similar values with NLP-detecting copies and imputation.

It also has integrated technology - SmartFusion - that lets users combine data frames in one click. Paxata is a great tool to use if your work requires a more intensive cleaning of data.



【待悬赏】 赏金: 2元

Google Cloud Prediction API

By offering RESTful APIs, Google Cloud Prediction API creates machine learning patterns for applications on Android. The platform is designed for mobile apps for use on the Android Operating System.

Some examples include the recommendation engine which predicts products or movies the user might enjoy, depending on the user’s past viewing habits.

Spam detection can also be an example which categorizes emails as non-spam or spam. Purchase prediction, however, guesses how much users are likely spending daily based on his or her spending history.



【待悬赏】 赏金: 1元

Narrative Science

It is designed from a creative idea that produces automated reports which utilize data. It functions like a tool that tells stories using next level natural language processes to develop reports, something that is the same as a consultation report. At present, Narrative Science, as of now, has been utilized in insurance, financial, e-commerce, and government domains. Included in their list of customers are MasterCard, PayScale, Deloitte, Forbes, and much more.



【待悬赏】 赏金: 2元

Trifecta

It particularly focused on preparing data. It has two main products:

  • Wrangler - Free
  • Wrangler Enterprise - Paid

While performing data cleaning, Trifacta also sends out a unique graphic user interface.  It uses data for input and gives an easy summary with a variety of statistics arranged by column. Aside from that, it recommends automatically some changes, which is used by clicking it once.

Some of the transformations may be applied to the information using pre-patterned applications, which can be used in an easy manner within the interface. The tool also follows a specific process which starts with discovering and accessing the first look at any information to get an idea of what you possess.



【待悬赏】 赏金: 2元

MLBase

This is considered as one of the many open-source projects, which is developed at The University of California, Berkeley by Algorithms Machines People Lab. The main idea of the tool is to give a simple solution to apply machine learning in solving macro problems.  These are some of their offers:

ML Optimizer – The optimizer is designed to solve search problems over ML algorithms within the MLlib and MLI as well as feature extractors. It is tasked to automate the operations of the ML pipeline construction.

MLib – Now it is supported by the Spark Community and working as the core distribution of ML Library. This was originally created for the MLBase Project.

MLI – It uses extreme ML programming abstractions. A Prototype API for extraction of feature and Developmental Algorithms.



【待悬赏】 赏金: 1元

Automatic Statistician

It is not an actual product because it is a research company, which creates analysis and data exploration tools. To obtain a detailed report, it takes in different types of data using natural language process. It is still being developed and there is very little information available surrounding the initiative. It is a possibility that it is under the supervision of Google.



【待悬赏】 赏金: 1元

WEKA

This is considered as a data collection software written using Java. It was developed at the University of Waikato in New Zealand by The Machine Learning Group, this is a Graphic User Interface-programmed tool, which can be ideal for novice users of data science. Aside from its core features, it is also open-source.
It is currently used more in academic settings, but it has a clear potential of being a stepping stone for something big quite soon.



【已悬赏】 赏金: 1元

数据科学家在美国被评为薪水最高的工作之一,因为公司愿意支付大量的现金给那些被雇佣的数据科学家。对学生和生涯中期的专业人士提高必要技能和使用这些技能进行实践,这创造了一个机会。

cyt5969858
翻译于 45天前
 

参与本段翻译用户:
cyt5969858

显示原文内容

GMT+8, 2018-1-22 14:27 , Processed in 0.081561 second(s), 11 queries .