在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):airbnb/aerosolve开源软件地址(OpenSource Url):https://github.com/airbnb/aerosolve开源编程语言(OpenSource Language):Scala 51.2%开源软件介绍(OpenSource Introduction):aerosolveMachine learning for humans. What is it?A machine learning library designed from the ground up to be human friendly. It is different from other machine learning libraries in the following ways:
This library is meant to be used with sparse, interpretable features such as those that commonly occur in search (search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense non-human interpretable features such as raw pixels or audio samples. There are a few reasons to focus on interpretability:
How to get started?The artifacts for aerosolve are hosted on bintray. If you use Maven, SBT or Gradle you can just point to bintray as a repository and automatically fetch the artifacts. Check out the image impression demo where you can learn how to teach the algorithm to paint in the pointillism style of painting. Image Impressionism Demo. There is also an income prediction demo based on a popular machine learning benchmark. Income Prediction Demo. Feature RepresentationThis section dives into the thrift based feature representation. Features are grouped into logical groups called families of features. The reason for this is so we can express transformations on an entire feature family at once or interact two different families of features together to create a new feature family. There are three kinds of features per FeatureVector:
Example RepresentationExamples are the basic unit of creating training data and scoring. A single example is composed of:
The reasons for having this structure are:
Feature Transform languageThis section dives into the feature transform language. Feature transforms are applied with a separate transformer module that is decoupled from the model. This allows the user to break apart transforms or transform data ahead of time of scoring for example. e.g. in an application the items in a corpus may be transformed ahead of time and stored, while the context is not known until runtime. Then at runtime, one can transform the context and combined them with each transformed item to get the final feature vector that is then fed to the models. Feature transforms allow us to modify FeatureVectors on the fly. This allows engineers to rapidly iterate on feature engineering quickly and in a controlled way. Here are some examples of feature transforms that are commonly used:
Please see the corresponding unit tests as to what these transforms do, what kind of features they operate on and what kind of config they expect. ModelsThis section covers debuggable models. Although there are several models in the model directory only two are the main debuggable models. The rest are experimental or sub-models that create transforms for the interpretable models. Linear model. Supports hinge, logistic, epsilon insensitive regression, ranking loss functions. Only operates on stringFeatures. The label for the task is stored in a special feature family and specified by rank_key in the config. See the linear model unit tests on how to set up the models. Note that in conjunction with quantization and crosses you can get incredible amounts of complexity from the "linear" model, so it is not actually your regular linear model but something more complex and can be thought of as a bushy, very wide decision tree with millions of branches. Spline model. A general additive linear piecewise spline model. The training is done at a higher resolution specified by num_buckets between the min and max of a feature's range. At the end of each iteration we attempt to project the linear piecewise spline into a lower dimensional function such as a polynomial spline with Dirac delta endpoints. If the RMSE of the projection is above threshold, we leave the spline alone in the high resolution piecewise linear mode. This allows us to debug the spline model for features that are buggy or unexpectedly complex (e.g. jumping up and down when we expect some kind of smoothness)
IDEIf you use intellij, try build first, so that thrift classes is available and to fix the spark compiling error inside intellij, type SupportIn the wildOrganizations and projects using |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论