您的位置:新葡亰496net > 奥门新萄京娱乐场 > numpy的axis的理解和检验,对numpy中轴与维度的理解

numpy的axis的理解和检验,对numpy中轴与维度的理解

发布时间:2019-09-22 11:03编辑:奥门新萄京娱乐场浏览(110)

    原文: 

    NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is rank.

    The Basics

    NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is rank.

    Numpy的主要对象是同类的多维数组。Numpy是一个具有相同类型数值的表,表的内容可以通过一个tuple来索引。Numpy中的维度称为称为axes,axes的数量称为rank

    Numpy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an ndarray object are:

    Numpy的数组类称为ndarray,也称作别名数组。numpy.array与Python标准库里的array.array类并不一样,标准库里的数组只能是一维的而且功能很少。ndarray对象的重要属性展示如下:

    • ndarray.ndim 返回一个number,维度,axes数,rank值
      the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.
    • ndarray.shape 返回一个tuple,表示形状,如(2,3)表示2x3,(3,3,3)表示3x3x3
      the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.
    • ndarray.size 返回一个number,表示ndarray中所有元素的个数,等价于shape中各个元素的乘积
      the total number of elements of the array. This is equal to the product of the elements of shape.
    • ndarray.dtype 返回一个dtype对象,表示ndarray中元素的类型
      an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
    • ndarray.itemsize
      the size in bytes of each element of the array. For example, an array of elements of type float64 hasitemsize
      8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent tondarray.dtype.itemsize.
    • ndarray.data
      the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

    The Basics


    NumPy’s main object is the homogeneous(同类型的) multidimensional(多维) array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes(轴). The number of axes is rank.

    For example, the coordinates(坐标) of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis. That axis has a length of 3. In the example pictured below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.

    [

    [1,2,3],

    [4,5,6]

    ]

    NumPy’s array class is called ndarray. It is also known by the alias array. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. The more important attributes of an ndarray object are:

    ndarray.ndim(维度):

    the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

    ndarray.shape(形状):

    the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix(矩阵) with rows and columns, shape will be(n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

    ndarray.size(元素个数)

    the total number of elements of the array. This is equal to the product(乘积) of the elements of shape.

    ndarray.dtype(元素类型):

    an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

    ndarray.itemsize(元素大小):

    the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

    ndarray.data:

    the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

    An example:

    >>>import numpy as np

    >>> a = np.arange(15).reshape(3,5)

    >>> a

    array([[0,1,2,3,4],

    [5,6,7,8,9],

    [10,11,12,13,14]])

    >>> a.shape

    (3,5)

    >>> a.ndim

    2

    >>> a.dtype.name

    'int64'

    >>> a.itemsize

    8

    >>> a.size

    15

    >>>type(a)

    >>> b = np.array([6,7,8])

    >>> b

    array([6,7,8])

    >>>type(b)

    通过观察级互动操纵散点图轴

    Abstract—Scatterplots are effective visualization techniques for multidimensional data that use two (or three) axes to visualize data
    items as a point at its corresponding x and y Cartesian coordinates. Typically, each axis is bound to a single data attribute. Interactive exploration occurs by changing the data attributes bound to each of these axes. In the case of using scatterplots to visualize
    the outputs of dimension reduction techniques, the x and y axes are combinations of the true, high-dimensional data. For these
    spatializations, the axes present usability challenges in terms of interpretability and interactivity. That is, understanding the axes
    and interacting with them to make adjustments can be challenging. In this paper, we present InterAxis, a visual analytics technique
    to properly interpret, define, and change an axis in a user-driven manner. Users are given the ability to define and modify axes by
    dragging data items to either side of the x or y axes, from which the system computes a linear combination of data attributes and binds
    it to the axis. Further, users can directly tune the positive and negative contribution to these complex axes by using the visualization
    of data attributes that correspond to each axis. We describe the details of our technique and demonstrate the intended usage through
    two scenarios.

    **
    摘要-散点图是使用两个(或三个)轴来可视化数据的多维数据的有效可视化技术物品作为其对应的x和y笛卡尔坐标的点。通常,每个轴都绑定到单个数据属性。通过更改绑定到这些轴中的每一个的数据属性来进行交互式探索。在使用散点图进行可视化的情况下
    尺寸缩小技术的输出,x和y轴是真实的,高维数据的组合。对于这些
    空间化,轴在可解释性和交互性方面呈现可用性挑战。那就是理解轴
    并与他们进行互动以进行调整是具有挑战性的。在本文中,我们介绍了视觉分析技术InterAxis
    以用户驱动的方式正确解释,定义和更改轴。用户可以通过定义和修改轴
    将数据项拖动到x轴或y轴的任一侧,系统将从中计算数据属性和绑定的线性组合它到轴。此外,用户可以通过使用可视化来直接调整对这些复杂轴的正负贡献的数据属性对应于每个轴。我们描述我们的技术的细节,并展示预期的用途两种情况。**

    Index Terms—Scatterplots, user interaction, model steering

    索引术语——散点图,用户交互,模型操纵

    Scatterplots are commonly utilized in visualizing relationships between two individual data attributes . The use of two orthogonal
    axes mapped to data attributes produces a Cartesian space where data
    objects can be charted. A basic strategy to form these axes in multidimensional data visualization is to assign each axis an individual
    feature or dimension originally given in a dataset. For example, plotting temperature over time on the y and x axes, respectively, generates a chart that can be used for understanding the relationship between
    these two data attributes. However, this has a severe scalability issue because two-dimensional (2D) scatterplots can represent only two
    features out of many at any given point of time.

    散点图通常用于可视化两个单独数据属性之间的关系。 使用两个正交
    映射到数据属性的轴产生数据的笛卡尔空间对象可以被绘制。 在多维数据可视化中形成这些轴的基本策略是将每个轴分配给个体特征或尺寸原始在数据集中给出。 例如,分别在y轴和x轴上绘制温度随时间的变化,生成可以用于理解关系的图表这两个数据属性。 然而,这具有严重的可扩展性问题,因为二维(2D)散点图可以仅表示两个在任何给定时间点的许多功能。

    Instead, an alternative strategy that better handles this scalability issue is dimension reduction, which involves multiple original features
    to represent each axis. Dimension reduction [21] is a popular technique used to transform high-dimensional data into lower-dimensional
    views (typically, 2D scatterplots). While a variety of approaches exist,
    their fundamental functionality is similar: to solve for distances between data points in a lower-dimensional space that closely represents
    the true distances between the points in a high-dimensional space. This
    is carried out by variations in solving for distance metrics from the
    data.

    相反,更好地处理这种可扩展性问题的替代策略是维度降低,其涉及多个原始特征以表示每个轴。 尺寸减小[21]是用于将高维数据转换为低维的流行技术视图(通常为2D散点图)。 虽然存在各种方法,它们的基本功能类似于:解决紧密代表的低维空间中的数据点之间的距离高维空间点之间的真实距离。 这个是通过解决距离度量的变化进行的数据。

    In the visual and perceptual understanding of a scatterplot, the interpretation of its axes plays a crucial role. That is, understanding what
    it means to have large/small values along the x or y axis significantly
    helps the users’ reasoning process about why the relationships among
    data items are close/remote in a scatterplot. In the case of traditional
    scatterplots where each axis is directly mapped to a particular data
    attribute (without any dimension reduction), this process is straightforward. However, this is not often the case when it comes to the axis
    of a 2D scatterplot generated by dimension reduction. One of the primary reasons is that only a limited set of dimension reduction methods
    provide the interpretability of the axes of a scatterplot. Such methods include traditional methods such as principal component analysis
    (PCA) [27] and linear discriminant analysis [23], which form an axis
    (or a reduced dimension) explicitly as a linear combination of the original data attributes. Through this linear combination representation of
    the original attributes, one can interpret the contribution of each original attribute to the axis. On the other hand, many other dimension
    reduction methods form each axis implicitly in terms of the original
    attributes, and thus they do not provide users with its clear meaning.
    Most advanced non-linear dimension reduction methods such as manifold learning [33] correspond to this case. Even worse, in some other
    popular methods such as multidimensional scaling (MDS) [31] and
    force-directed graph layout [22], these are rotation invariant, which
    means that the axis is not defined at all. Thus, communicating with
    users about the meaning of the axes resulting from dimension reduction techniques is an open challenge.

    在对散点图的视觉和感知理解中,对其轴的解释起着至关重要的作用。那就是理解什么这意味着显着地沿x或y轴具有大/小的值帮助用户推理过程中关于为什么之间的关系数据项在散点图中是近/远的。在传统的情况下
    每个轴直接映射到特定数据的散点图属性(没有任何维度减少),这个过程很简单。然而,这在轴上并不常见由尺寸减小生成的2D散点图。其中一个主要原因是只有一些有限的尺寸缩小方法提供散点图的轴的可解释性。这些方法包括诸如主成分分析的传统方法(PCA)[27]和线性判别分析[23],形成轴(或缩小的维度)显式地作为原始数据属性的线性组合。通过这种线性组合表示原始属性,可以解释每个原始属性对轴的贡献。另一方面,许多其他方面缩减方法以原始方式隐含地形成每个轴属性,因此它们不为用户提供其明确的含义。最先进的非线性尺寸缩小方法,如歧管学习[33]对应于这种情况。更糟糕的是,在其他一些流行的方法,如多维缩放(MDS)[31]和力导向图布局[22],这些是旋转不变量,其中意味着轴根本没有定义。因此,沟通用户关于由维度缩减技术产生的轴的含义是一个开放的挑战。

    Another issue with the scatterplot generated by dimension reduction lies in the lack of interactivity. Forming the axes via dimension
    reduction does not typically allow human intervention. In other words,
    most of the dimension reduction methods are performed in a fully automated manner on the basis of their own pre-defined mathematical
    criteria, and thus, diverse user needs and task goals are not considered
    in this process. For instance, the PCA criterion, which maximally preserves the total variance of data, may not align well with the goal of
    a user’s task. While MDS attempts to preserve all pairwise distances
    with equal weights, one may want to focus on a subset of data points,
    e.g., a local region in a scatterplot, at a time.
    Motivated by these challenges, we propose a novel interactive
    knowledge specification method for multidimensional data visualization, which is an alternative to the purely automatic process of generating a scatterplot via dimension reduction. The proposed method interactively forms an axis, thereby generating a corresponding scatterplot
    in a user-driven manner. The key novelty of the proposed method lies
    in the direct and seamless incorporation of user-selected data items for
    characterizing the axis during the data exploration process. Our technique enables users to create and modify the axes by dragging data
    objects to the high and low locations on both the x and y axes. The
    proposed method defines the meaning of an axis accordingly in the
    form of a linear combination of original data features, similar to the
    output of linear dimension reduction methods. Such a user-driven linear combination of data attributes is visualized on each axis, showing
    the positive or negative contribution of each attribute to the axis. Finally, users can continually refine the axes by dragging additional data
    points to the axes, or by directly adjusting the contribution of the data
    attributes as part of the linear combination.

    由维度降低产生的散点图的另一个问题在于缺乏交互性。通过维度降低形成轴通常不允许人为干预。换一种说法,
    大多数维度降低方法是以完全自动化的方式根据它们自己的预定义数学来执行的标准,因此,在这个过程中不考虑不同的用户需求和任务目标。例如,最大限度地保留数据总方差的PCA标准可能与目标无关用户的任务。虽然MDS尝试保留所有成对的距离具有相等的权重,可能想要集中在数据点的一个子集上,例如,散点图中的局部区域。
    受到这些挑战的驱动,我们提出了一个新颖的互动
    用于多维数据可视化的知识规范方法,其是通过维度降低生成散点图的纯自动过程的替代方法。所提出的方法交互地形成轴,由此产生相应的散点图。以用户驱动的方式。提出的方法的关键新颖之处在于
    在用户选择的数据项目中直接和无缝地结合
    在数据勘探过程中表征轴。我们的技术使用户能够通过拖动数据来创建和修改轴对象到x和y轴上的高低位置。该提出的方法定义了相应的轴的含义形式的原始数据特征的线性组合,类似于
    线性维度降低方法的输出。这样的用户驱动的数据属性的线性组合在每个轴上可视化,显示每个属性对轴的正或负贡献。最后,用户可以通过拖动附加数据来持续细化轴指向轴,或通过直接调整数据的贡献属性作为线性组合的一部分。

    The primary contributions of this work include the following:
    • a visual analytics technique for directly creating, modifying, and
    visualizing complicated axes formed by a linear combination of
    data attributes
    • a user interaction technique enabling seamless interactivity via
    both data objects and data attributes to steer the meaning of the
    axes
    • a visual analytics technique to help users discover and weigh data
    attributes

    这项工作的主要贡献包括:
    •通过数据属性线性组合直观的分析技术,直接创建,修改和
    可视化形成复杂轴
    •通过用户交互技术实现无缝交互
    这两个数据对象和数据属性来指导它轴的意义
    •视觉分析技术,帮助用户发现和权衡数据属性

    The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 describes our proof-of-concept visual analytics
    system along with how the proposed interaction techniques are performed from the perspectives of both the front end and the back end,
    followed by a discussion about our design rationale. Section 4 presents
    several usage scenarios showcasing the advantages of the proposed interaction techniques. Section 5 presents in-depth discussions about the
    limitations of our interaction techniques as well as potential directions
    for improving them. Finally, Section 6 concludes the paper with some
    future work.

    本文的其余部分组织如下:第二部分讨论相关工作。 第3节(怎么实现)描述了我们的概念验证视觉分析系统以及从前端和后端的角度如何执行所提出的交互技术,其次是关于我们的设计理念的讨论。 第4节(使用场景)介绍几种使用场景展示了所提出的交互技术的优点。 第5节对此进行了深入的讨论我们的互动技术的局限性以及潜在的方向改善他们。 最后,第6节总结了一些文章未来的工作。

    一、官网的定义:

    For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis. That axis has a length of 3. In the example pictured below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.

    Array Creation

    有几种方式可以产生数组,可以使用array方法通过常规的Python list或者tuple来生成数组

    1. ndarray.array(seq[, dtype='']) 参数必须为sequence, 这个seq可以是单一的seq也可以是seq的seq的seq...,对应着产生几维的array。也可以同时指定元素类型
    >>> b = np.array([1.2, 3.5, 5.1])      #seq
    >>> b.dtypedtype('float64')
    
    >>> b = np.array([(1.5,2,3), (4,5,6)])    #seq of seq
    >>> b
    array([[ 1.5, 2. , 3. ], [ 4. , 5. , 6. ]])
    
    >>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
    >>> c
    array([[ 1. 0.j, 2. 0.j], [ 3. 0.j, 4. 0.j]])
    
    1. 有时候不知道数组的元素,却知道数组的shape,这时可以通过以下三种方法生产数组
    • zeros(tuple, dtype) : np.zeros( (3,4) ),3x4全零数组
    • ones(tuple, dtype) :np.ones( (2,3,4), dtype=np.int16 ),2x3x4全1数组
    • empty(tuple):np.empty( (2,3) ),随机数组
    1. arange(init,stop,step) 类似Python中标准的range函数,step为间隔长度
    2. linspace(init,stop,segment) segment为array的长度

    2.1 Multiattribute Data Visualization

    Fig. 2. A scatterplot generated by Tableau [41]. Users can interactively explore data by selecting and changing the bindings between
    data attributes and axes.

    新葡亰496net 1

    图2

    图2,Tableau [41]生成的散点图。 用户可以通过选择和更改两者之间的绑定来交互地探索数据数据属性和轴。

    The design space for visualization techniques for representing multiattribute data is large [28]. For example, the existing techniques include iconic displays [6], transforming displays based on geometric
    characteristics [13], and stacked visual representations [32]. Among
    these many techniques, one commonly used technique is the scatterplot [12, 20, 45], owing to the visual simplicity and cultural familiarity
    of such charts [43]. Scatterplots (such as the one shown in Fig. 2) represent data on a Cartesian plane defined by the two graphical axes (the
    x and the y axes). Three-dimensional scatterplots are also an available
    option, but their use in information visualization is limited given the
    perceptual and visual challenges [38, 47]. Systems that enable users to
    generate scatterplots include Tableau [41], GGobi [40], Matlab [34],
    Spotfire [1], and Microsoft Excel [19]. One basic user interaction supported by scatterplots is to select and change the mapping of the axes
    to data attributes (Fig. 2).
    Other kinds of high-dimensional data have also been visualized in
    the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
    facial images [8], and text documents [7].

    用于表示多属性数据的可视化技术的设计空间很大[28]。例如,现有技术包括图标显示[6],基于几何变换显示特征[13]和叠加的视觉表示[32]。其中
    这些许多技术,一种常用的技术是散点图[12,20,45],由于视觉简洁和文化熟悉度的这样的图表[43]。散点图(如图2所示)表示由两个图形轴定义的笛卡尔坐标平面上的数据(x和y轴)。三维散点图也是可用的选项,但它们在信息可视化中的使用受到感知和视觉挑战的限制[38,47]。允许用户使用生成散点图的系统包括Tableau [41],GGobi [40],Matlab [34],
    Spotfire [1]和Microsoft Excel [19]。通过散点图支持的一个基本用户交互是选择和更改轴的映射到数据属性(图2)。

    Fig. 3. A scatterplot matrix (adapted from [15]) showing all individual
    pairwise feature scatterplots of an 8-dimensional dataset

    新葡亰496net 2

    图三

    图3.散点图矩阵(从[15]改编))显示所有个体8维数据集的成对特征散点图

    Fig. 4. A Galaxy View generated by IN-SPIRE [48] showing a scatterplot of documents (dots)

    新葡亰496net 3

    图四

    图4. IN-SPIRE [48]生成的Galaxy View,显示文件散点图(点)

    As dataset complexities increase, often, the number of data attributes to select from increases as well. This causes situations where
    directly selecting one out of hundreds or thousands of data attributes
    can be less than optimal. As such, different types of techniques exist
    to show more combinations of data attributes simultaneously. For example, multiple scatterplots can be arranged into a single view called a scatterplot matrix [12]. A scatterplot matrix (such as the example
    shown in Fig. 3, adapted from [15]) binds data attributes to rows and
    columns so that each cell in the matrix can represent a single scatterplot. As such, users do not have to individually bind data attributes to
    the axes and interactively choose among the potentially large number
    of choices

    随着数据集复杂性的增加,通常选择的数据属性数量也会增加。 这导致了情况,直接从数以百计的数据属性中直接选择一个不是最佳的。 因此,存在不同类型的技术同时显示更多的数据属性组合。 例如,可以将多个散点图排列成称为散点图矩阵的单个视图[12]。 散点图矩阵(如示例
    如图3,改编自[15])将数据属性绑定到行和列,使得矩阵中的每个单元格可以表示单个散点图。 因此,用户不必单独绑定数据属性
    轴和不在需要大量的选择

    2.2 Applications of Dimension Reduction in Information Visualization

    在信息可视化中,降维的应用

    When using dimension reduction for visualization purposes, the goal
    is to provide a low-dimensional view, typically a 2D scatterplot, in
    a manner that the original high-dimensional distances between data
    points are maximally preserved in the resulting 2D views. These
    views often show spatial clusters or groups of data representing coherent contents. The widely used dimension reduction methods used
    for visualization include PCA [27], MDS [31], self-organizing map
    (SOM) [29], and generative topographic mapping (GTM) [3]. Recently, t-distributed stochastic neighbor embedding [46] has been proposed as a dimension reduction method, which is particularly suitable for generating 2D scatterplots that can reveal meaningful insights
    about data such as clusters and outliers

    当为了可视化目的使用降维时,目标
    是提供一个低维度的视图,通常是2D散点图,初始化高维数据点之间的距离需要最大程度的表现在2维视图中。 这些视图通常显示表示相干内容的空间群集或数据组。 使用广泛使用的降维方法
    可视化包括PCA [27],MDS [31],自组织图
    (SOM)[29]和生成地形图(GTM)[3]。 最近,t分布随机相邻嵌入[46]已经被提出作为一种维数减小方法,特别适用于生成可以揭示有意义的见解的二维散点图关于诸如集群和异常值之类的数据

    To date, these methods have been actively adopted in visual analytics systems. For example, IN-SPIRE [48], a well-known visual analytics system for document analysis, provides a Galaxy View (as shown
    in Fig. 4) that visualizes text corpora spatially by showing the pairwise similarity between documents as their distance in a 2D space.
    As a result, groups and clusters emerge, which can be perceived as
    the sets of similar documents, based on the geographic "near=similar"
    metaphor [39]. More recently, a visual analytics system applicable to
    more general high-dimensional data types including documents and
    images has been proposed, allowing a user to explore the diverse aspects of data by applying various dimension reduction methods to generate different scatterplot visualizations [9].
    Other kinds of high-dimensional data have also been visualized in
    the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
    facial images [8], and text documents [7].

    迄今为止,这些方法已经在视觉分析系统中得到积极应用。例如,IN-SPIRE [48],用于文档分析的知名视觉分析系统提供了一个Galaxy View(如图4)通过显示文档之间的成对相似性作为它们在2D空间中的距离,在空间上可视化文本语料库。结果,群体和集群出现,这可以被认为是各类相似的文件,基于地理“近=相似”比喻[39]。最近,一个视觉分析系统适用于更一般的高维数据类型包括文档和已经提出了图像,允许用户通过应用各种维度降低方法来生成不同的散点图可视化来探索数据的不同方面[9]。其他类型的高维数据也已被可视化基于维度降低的散点图形式,包括教育绩效数据,人口普查数据[18],葡萄酒特征[5]面部图像[8]和文本文档[7]。

    2.3 Interactivity for Dimension Reduction in Information
    Visualization

    在信息可视化中对于降维的交互

    In general, the axes created via dimension reduction techniques are defined by linear or non-linear combinations of original data dimensions.
    This complexity can lead to trust and interpretation challenges for domain experts exploring their data visually [10]. For example, users
    may question whether their interpretation of a pattern is trustworthy or
    if it is just an artifact of a dimension reduction technique. More fundamentally, using only two dimensions to represent considerably higherdimensional data inevitably involves significant information loss and
    distortion. To overcome these issues, various user interactions have
    been employed in numerous visual analytics systems.
    One approach to user interaction is via direct manipulation of dimension reduction model parameters. For example, Jeong et al.
    presented iPCA, a visual analytics application that visualizes highdimensional data in a 2D scatterplot using PCA [26]. They utilize
    graphical controls (e.g., sliders) to enable users to directly manipulate
    the weight on the principal components used in PCA. As a result, the
    adjustments by the user generate a new projection (i.e., a new scatterplot). Similar interaction guidelines have been used by other applications, such as a text visualization system called STREAMIT [2].
    A different set of techniques for incorporating user interactions into
    such visual analytics systems also exists. Semantic interaction techniques function by inferring model updates based on direct interactions performed in the visualization [16, 17]. For example, Endert et
    al. have shown how directly manipulating the position of points in a
    2D scatterplot can be used for inferring the parameters of PCA, MDS,
    and GTM [18]. These inferences can also be used for exporting the
    specification of distance functions computed in the dimension reduction step so that they can be reused, shared, or simply saved [5].
    Other than manipulating data items to interact with scatterplots, researchers have studied the interaction techniques that manipulate features or dimensions. Yi et al. have presented a technique called Dust
    & Magnet that allows users to additionally place features or dimensions on top of a scatterplot themselves to see which data items have
    large values of these features or dimensions [49]. For text analysis, the
    VIBE system allows users to perform similar interactions with keywords [35]. In addition, Turkay et al. proposed a technique using
    dual scatterplots one of which shows data items while the other shows
    features [44]. By providing brushing and linking as well as filtering
    operations on both data items and features in these dual scatterplots,
    users can check major patterns as well as outliers among data items
    and among features.The technique proposed in this paper follows a similar idea of interacting with both data items and features, but the main novelty of
    the proposed technique against the existing work lies in the capability
    of directly defining and interpreting the axes of the 2D scatterplot by
    assigning the data items of our interest to the axes. In this respect, our
    work is related to PivotSlice, a technique recently proposed by Zhao
    et al. that allows faceted browsing of high-dimensional data [50], as
    it allows users to specify data attributes on axes of the scatterplot by
    directly dragging the attribute to the axis. However, our technique enables users to drag data objects (instead of data attributes) to the axis.
    Further, the proposed technique does not divide the scatterplot into a
    multifaceted view.
    Furthermore, a technique called flexible linked axes [11] has a relationship with our work from a different aspect. That is, this technique
    is a different type of interaction that allows users to draw axes on a canvas, where scatterplots can be generated between any two neighboring
    axes. However, the main goal of this technique is fundamentally different from ours in that it attempts to flexibly coordinate and place
    multiple scatterplots on a large canvas, while our focus is on improving a single scatterplot for better supporting the interactive exploration
    of data based on a more sophisticated, user-driven axis specification.
    Further, Kondo and Collins have shown how directly interacting with
    visualizations can be used for revealing temporal trends and relationships between data items [30]. Their work allowed users to manipulate
    the position of data points in a scatterplot to reveal the temporal trends
    in data, again enabling interactions directly on the data items in a scatterplot to parameterize a data model.

    通常,通过降维技术创建的轴由原始数据维度的线性或非线性组合定义。这种复杂性可能导致领域专家解释数据可视化的信任和解释挑战[10]。例如,用户可能质疑他们对模式的解释是否值得信赖
    如果它只是尺寸缩小技术的工件。更基本的是,仅使用二维代表相当高的维度数据就不可避免地会涉及重大的信息丢失失真。为了克服这些问题,各种用户交互都有被用于许多视觉分析系统。
    用户交互的一种方法是通过直接操纵维度降低模型参数。例如,Jeong et al。提出了iPCA,一种视觉分析应用程序,可以使用PCA在2D散点图中显示高维数据[26]。他们利用图形控件(例如滑块),以使用户能够直接操纵PCA中使用的主要成分的重量。结果,用户的调整产生新的投影(即新的散点图)。其他应用程序也使用了类似的交互指南,例如名为STREAMIT [2]的文本可视化系统。用于将用户交互纳入的一组不同的技术
    这样的视觉分析系统也存在。语义交互技术通过基于在可视化中执行的直接交互来推断模型更新而起作用[16,17]。例如,Endert et
    人。已经表明如何直接操纵一个点的位置2D散点图可用于推断PCA,MDS,和GTM [18]。这些推论也可以用于出口在维度降低步骤中计算出的距离函数的规范,以便可以重用,共享或简单地保存[5]。
    除了操纵数据项与分散图进行交互之外,研究人员还研究了操纵特征或尺寸的相互作用技术。 Yi等已经提出了一种称为尘埃的技术
    &Magnet,允许用户另外将功能或维度放在散点图上,以查看哪些数据项
    这些特征或尺寸的大值[49]。对于文本分析,
    VIBE系统允许用户执行与关键字的类似交互[35]。此外,Turkay等提出了一种使用技术双散点图其中一个显示数据项,而另一个显示
    特征[44]。通过提供刷洗和连接以及过滤
    对这两个散点图中的数据项和特征的操作,
    用户可以检查数据项中的主要模式以及异常值
    numpy的axis的理解和检验,对numpy中轴与维度的理解。本文提出的技术遵循与数据项和特征相互作用的类似思想,但主要的新颖性
    针对现有工作的提出的技术在于能力
    直接定义和解释2D散点图的轴
    将我们感兴趣的数据项分配给轴。在这方面,我们的
    工作涉及到PivotSlice,这是赵先生最近提出的一种技术
    et al。这允许分面浏览高维数据[50],as
    它允许用户在散点图的轴上指定数据属性
    直接将属性拖到轴上。然而,我们的技术使用户能够将数据对象(而不是数据属性)拖到轴上。
    此外,所提出的技术不将散点图划分成a
    多方面的观点
    此外,一种称为灵活连接轴的技术[11]与我们在不同方面的工作有关系。也就是说,这种技术
    是一种不同类型的互动,允许用户在画布上绘制轴,其中可以在任何两个相邻之间生成散点图
    轴。然而,这种技术的主要目标是与我们的根本不同,它试图灵活地协调和放置
    大型画布上的多个散点图,而我们的重点是改进单个散点图,以更好地支持互动式探索
    的数据基于更复杂的用户驱动的轴规范。
    此外,Kondo和Collins已经展示了如何直接相互作用
    可视化可用于揭示数据项之间的时间趋势和关系[30]。他们的工作允许用户操纵
    数据点在散点图中的位置,以揭示时间趋势
    在数据中,再次在分散图中直接对数据项进行交互以参数化数据模型。

    3 PROPOSED TECHNIQUE
    To realize the proposed interaction technique, we built a proof-ofconcept visual analytics system. In this section, we describe (1) the
    overall design of the proposed visual analytics system, (2) the proposed interaction to steer the axis in a user-driven manner, (3) the underlying mathematical details to support the proposed user interaction,
    (4) the design rationale, and (5) the implementation details of the proposed system.

    3提出的技术
    为了实现所提出的交互技术,我们构建了一个验证视觉分析系统。 在本节中,我们将描述(1)
    提出的视觉分析系统的总体设计,(2)提出的以用户驱动的方式操作轴的交互作用,(3)支持提出的用户交互的基础数学细节,
    (4)设计理由,(5)拟议制度的实施细则。

    3.1 System Design
    As shown in Fig. 1 by using the well-known Car dataset, which consists of 387 data items with 18 attributes,1 the proposed system mainly
    contains three panels: (1) the scatterplot view (Fig. 1(A)), (2) the
    axis interaction panel to support the proposed interaction capabilities
    (Fig. 1(B-D)), and the data detail view (Fig. 1(E)).
    The user interaction technique presented in this paper fosters a visual data exploration process grounded in the principles of semantic
    interaction techniques [16, 17]. That is, the system interprets the analytical reasoning of exploratory user interactions to steer the underlying data model. The generic workflow supported by our user interaction technique is as follows:

    1. The user observes two data points that define the difference between the two semantic groupings (e.g., “nice cars” and “bad
      cars”).
    2. The user drags one data item to each side of the axis.
    3. Interaxis computes the weighting of data attributes that supports
      these higher-level groupings (Eq. 1). The weights are displayed
      in the bar chart below the axis.
    4. The scatterplot updates to reflect the newly defined axis, where
      data items are placed according to the similarity on either side of
      the axis (Eq. 2).
    5. The user can refine the semantic grouping by adding/removing
      data points or directly modifying the weighting in the visualization below the axes.
    6. The user can save the axis for future use and continue to explore
      the visualization iteratively by using the same interaction concept
      based on different semantic groupings.

    The scatterplot view provides a 2D overview of the data. By default,
    the first and the second features of data, e.g., Retail Price and HP
    (Horsepower), are assigned to the x and the y axes, respectively, but
    this initial view can be set up by using a dimension reduction method
    such as PCA [27] to provide another starting point. Data points are represented as semi-transparent circles so that regions with overlapped
    data points can be highlighted. The scatterplot view supports zoom
    and pan via mouse wheel operations on a white space (to zoom on
    both axes simultaneously) or over a particular axis (to zoom only on
    this axis). Hovering over or clicking on a data point, one can check the
    full details (or the original high-dimensional information) of the data
    item in the data detail view (Fig. 1(E)).
    The axis interaction panel consists of two drop zones (the high-end
    and the low-end of each axis), which the user drags data points into in
    order to steer the axis (Fig. 1(B)), an interactive bar chart (Fig. 1(C)),
    and a sub-panel (Fig. 1(D)) containing buttons to save the current axis
    for further use or to clear the data points currently assigned to the axis
    and a combo box to change the axis back to one among the original
    features or the previously defined axes. The bars in the interactive
    bar chart represent the contributions/weights of attributes to the corresponding axis. The longer the length of a bar is, the stronger its corresponding attribute contributes to the axis. The bars are color-coded
    by the signs of their weights: positive contributions in blue and negative contributions in red. Data points that are high on the positively
    weighted (blue-colored) attributes will be placed on the high-end side
    of the axis. Data points that are high on the negatively weighted attributes will be placed on the low-end side of the axis. For example,
    in Fig. 1(C), sedans tend to be on the left side of the scatterplot, while
    sports cars and cars with rear-wheel drive (RWD) tend to be on the
    right side. Positive and negative weights represent the magnitude and
    at which end of the axis the data points with those attributes will be
    placed

    3.1系统设计
    如图1所示,通过使用著名的Car数据集,其中包括387个具有18个属性的数据项,1个主要提出的系统
    包含三个面板:(1)散点图(图1(A)),(2)轴互动面板来支持所提出的交互能力(图1(B-D))和数据细节图(图1(E))。
    本文提出的用户交互技术,建立在语义学原理基础上的可视化数据挖掘过程交互技术[16,17]。也就是说,系统解释了探索性用户交互的分析推理,以引导基础数据模型。我们的用户交互技术支持的通用工作流程如下:
    1.用户观察定义两个语义分组之间的差异的两个数据点(例如,“漂亮的车”和“不好的”汽车”)。
    2.用户将一个数据项拖到轴的每一侧。
    3.Interaxis计算支持的数据属性的权重这些较高级别的分组(等式1)。显示权重在轴下方的条形图中。
    4.散点图更新以反映新定义的轴,其中数据项根据两边的相似度进行放置在轴的一侧(方程2)。
    5.用户可以通过添加/删除来细化语义分组数据点或直接修改轴下的可视化中的权重。
    6.用户可以保存轴以备将来使用,并继续探索通过使用相同的交互概念迭代地进行可视化基于不同的语义分组。
    散点图提供了数据的2D概述。默认,数据的第一和第二个特征,例如零售价和HP(马力)分别分配给x轴和y轴,但是可以通过使用尺寸缩小方法来设置此初始视图如PCA [27]提供了另一个起点。数据点被表示为半透明圆圈,使得具有重叠的区域数据点可以突出显示。散点图视图支持缩放
    并通过鼠标滚轮操作在白色空间(以放大同时)或在特定的轴上(仅缩小这个轴)。悬停或点击数据点,可以检查数据的完整细节(或原始高维信息)数据详细视图中的项目(图1(E))。
    轴互动面板由两个放置区(高端组成和每个轴的低端),用户将数据点拖入为了引导轴(图1(B)),交互式条形图(图1(C)),和包含用于保存当前轴的按钮的子面板(图1(D))用于进一步使用或清除当前分配给轴的数据点和一个组合框将轴更改回原来的一个特征或先前定义的轴。在互动的条形图表示属性对相应轴的贡献/重量。条的长度越长,其对应的属性越强于轴。条形框是彩色编码的通过他们的权重的迹象:积极的贡献在蓝色和负面的贡献在红色。数据点高,积极加权(蓝色)属性将被放置在高端端的轴。负权重属性高的数据点将放置在轴的低端侧。例如,
    在图1中。 1(C),轿车往往位于散点图的左侧,而
    具有后轮驱动(RWD)的跑车和汽车倾向于在
    右边。正负权重表示大小和在轴的哪一端,数据点与这些属性将一起
    放置

    Fig. 1. An overview of the proposed visual analytics system, InterAxis, showing a car dataset, which includes 387 data items with
    18 attributes. The proposed system contains three panels: (A) the scatterplot view to provide a two-dimensional overview of data,
    (B-D) the axis interaction panel to support the proposed interaction capabilities, and (E) the data detail view to show the original
    high-dimensional information of the data items of interest. The axis interaction panel (B-D) consists of (B) two drop zones (the
    high-end and the low-end of each axis), which a user drags data points into in order to steer the axis, (C) an interactive bar chart,
    and a sub-panel containing buttons to save the current axis for future use (D, middle) or to clear the data points currently assigned
    to the axis (D, right) and a combo box to change the axis back to one among the original features or the previously created axes
    via our interaction (D, left).

    图1.提出的视觉分析系统的概述,InterAxis,显示一个汽车数据集,其中包括387个数据项 18个属性。 所提出的系统包含三个面板:(A)散点图视图以提供数据的二维概述,(B-D)轴互动面板支持提出的交互能力,(E)数据详细视图显示原始感兴趣的数据项的高维信息。 轴相互作用面板(B-D)由(B)两个放置区组成
    每个轴的高端和低端),用户拖动数据点以引导轴,(C)交互式条形图,
    和一个子面板,其中包含保存当前轴以供将来使用(D,中间)或清除当前分配的数据点的按钮
    到轴(D,右)和组合框将轴更改回原始要素或先前创建的轴之一
    通过我们的互动(D,左)。

    新葡亰496net 4

    图1

    3.2 Interactive Axis Steering
    The proposed method provides two types of interactions: (1) data-level
    axis steering and (2) attribute-level axis manipulation. Data-level axis
    steering is prompted by dragging a data point from the scatterplot into
    the two drop zones at the high- and the low- end of the axis. Attributelevel axis manipulation is prompted by directly adjusting the bars in
    the interactive bar chart.
    The main idea of the proposed interaction for steering the axis in
    a user-driven manner lies in an intuitive process of incorporating data
    items seamlessly while exploring data in a scatterplot. For example,
    when a user finds data points that he likes (or dislikes) in the scatterplot, he can drag them to the high-end (or the low-end) drop zone of
    an axis (Fig. 1(B)). Accordingly, a new axis is formed by reflecting
    these choices of data items, which will then update the scatterplot on
    the basis of the newly formed axis. The technical details about how
    we form a new axis will be described in the next section.
    How the axis is formed from this process is summarized and visualized as a bar chart (Fig. 1(C)) so that a user can get an idea about
    how much a particular original feature or dimension is emphasized or
    de-emphasized. Given such a bar chart, a user can further refine the
    meaning of an axis by directly manipulating the length of each bar
    via drag-and-drop operations on the tip of the bar (attribute-level axis
    manipulation).
    The entire interaction process can be dynamic and iterative. That is,
    a user can additionally assign new data items to an axis or remove data
    items that was already assigned to an axis. Furthermore, the abovedescribed direct manipulation on the bar chart can be performed at
    any moment during such an interactive exploration of the bar chart.
    Finally, a user can save the current definition of an axis, and then it is
    registered as a new entry in the combo box (Fig. 1(D, left)) so that a
    user can later recover the axis to a previously saved one.

    3.2交互轴操作
    所提出的方法提供了两种类型的交互:(1)数据级
    轴转向和(2)属性级轴操纵。数据级轴通过将数据点从散点图拖到中来提示转向在轴的高端和低端的两个落下区域。通过直接调整条形来提示属性级别的轴操作
    互动条形图。
    提出的相互作用的主要思想是将轴转向
    用户驱动的方式在于并入数据的直观过程
    在散点图中探索数据时,项目无缝连接。例如,
    当用户在散点图中找到他喜欢(或不喜欢)的数据点时,他可以将它们拖到高端(或低端)下拉区域
    轴(图1(B))。因此,通过反射形成新的轴
    这些数据项的选择,然后将更新散点图
    新形成轴的基础。技术细节如何
    我们形成一个新的轴将在下一节描述。
    如何从这个过程形成轴是总结和可视化为条形图(图1(C)),以便用户可以得到一个想法
    要强调特定原始特征或维度多少
    去加重。给定这样的条形图,用户可以进一步细化
    通过直接操纵每个条的长度来表示轴
    通过拖动操作在杆的顶端(属性级轴
    操作)。
    整个交互过程可以是动态的和迭代的。那是,
    用户可以另外向轴分配新的数据项或删除数据
    已分配给轴的项目。此外,可以在条形图上进行上述的直接操纵
    在条形图的这种互动探索过程中的任何时刻。
    最后,用户可以保存轴的当前定义,然后是
    在组合框中注册为新条目(图1(D,左)),以便a
    用户可以稍后将轴恢复到以前保存的轴。

    3.3 Underlying Techniques根本技术
    In this section, we describe the underlying technique for the proposed
    user interaction of forming the axis via data items. For the sake of
    brevity, we consider only the x axis (the horizontal axis) in a scatterplot, but the following description can be generalized to the y axis in
    the same manner

    在本节中,我们描述了提出的基础技术
    用户通过数据项形成轴的交互作用。 为了
    简而言之,我们在散点图中仅考虑x轴(横轴),但以下描述可以推广到y轴同样的方式

    Data preprocessing. As will be discussed later, the underlying
    model to define the axis is based on a linear combination of the original dimensions. To this end, we adopt data preprocessing steps used in linear regression models [14]. For a categorical variable with c different categories, we use dummy encoding, which converts it to a cdimensional indicator vector where the value of each dimension is 1
    if a data item is in the category of the corresponding dimension and
    0 otherwise. Next, we scale and translate each dimension (including
    both indicator and numerical variables) so that its value is exactly in
    the range from 0 to 1
    Linear transformation. Assuming that such data preprocessing is
    done, we denote a set of high-dimensional vectors of data items that
    the user assigned (via a drag-and-drop) to the high-end of the x axis
    as , ax n,xh,h� and a set of those that he dragged into
    the low-end side of the x axis as�, where
    n
    x,h and nx,l represent the total number of the assigned points to the
    high-end and the low-end of the x axis, respectively. Now, we define
    the linear transformation vector for the x axis as follows:

    This is then further scaled to have a unit Euclidean norm.
    One can define the linear transformation vector T
    y for the y axis
    in the same manner. Every data item is mapped to the x axis (and
    the y axis) via the transformation Tx (and Ty). That is, the i-th data
    item whose high-dimensional vector is represented as ai is mapped to
    a point in our 2D scatterplot so that its 2D coordinates are represented
    as follows:

    Owing to the easy interpretability of this linear model, one can understand the meaning of this transformation in a straightforward manner. That is, the resulting x axis basically emphasizes the features
    or dimensions that have large values on the high-dimensional vectors
    contained in Ax,h but have low values on those in Ax,l. On the other
    hand, we de-emphasize the features that have low values on the vectors
    contained in Ax,h but have high values on those in Ax,l. In this manner,
    as a data item has larger (or lower) values on these emphasized dimensions and lower (or higher) values on the de-emphasized dimensions,
    its x coordinate will have a higher (or lower) value, appearing more on
    the right (or left) side of the x axis. The notations used in this section
    are summarized in Table 1.

    **数据预处理如下文将讨论的,底层
    定义轴的模型是基于原始尺寸的线性组合。为此,我们采用线性回归模型中使用的数据预处理步骤[14]新葡亰496net,。对于具有c个不同类别的分类变量,我们使用虚拟编码,将其转换为维度指示符向量,其中每个维度的值为1
    如果数据项在相应维度的类别中
    否则为0。接下来,我们缩放和翻译每个维度(包括
    指标和数值变量),使其值正好在
    范围从0到1
    线性变换。假设这样的数据预处理是
    完成,我们表示一组数据项的高维向量
    用户(通过拖放)分配到x轴的高端
    as,ax n,xh,h?和一组他拖入的那些
    x轴的低端侧为?,哪里?
    ñ
    x,h和nx,l表示分配给的点的总数
    高端和低端的x轴分别。现在,我们定义
    x轴的线性变换矢量如下:

    然后进一步缩放以具有单位欧几里得规范。
    可以定义线性变换向量T
    y为y轴
    以相同的方式。每个数据项都映射到x轴(和
    y轴)通过转换Tx(和Ty)。也就是说,第i个数据
    将其高维向量表示为ai的项目映射到
    我们的2D散点图中的一个点,以便表示其2D坐标
    如下:

    由于这种线性模型的易解释性,可以直接的方式了解这种变换的含义。也就是说,所得到的x轴基本上强调了特征
    或在高维度向量上具有大值的尺寸
    包含在Ax,h中,但在Ax,l中的值较低。在另一
    手,我们不强调在向量上具有低价值的特征
    包含在Ax,h中,但在Ax,l中具有高值。以这种方式,
    因为数据项在这些强调维度上具有更大(或更低)的值,而在减重维度上具有较低(或更高)的值,它的x坐标将具有更高(或更低)的值,更多出现x轴的右侧(或左侧)。本节中使用的符号总结在表1中。
    **

    [[ 1., 0., 0.],
     [ 0., 1., 2.]]
    

    Printing Arrays

    print()函数这样展示数组

    • 最后一个轴从左往右打印,
    • 倒数第二个轴从上往下打印,
    • 其余的也是从上往下打印,只是规模变大了。

    如果数组太大,打印会省略一部分

    >>> print(np.arange(10000))
    [ 0 1 2 ..., 9997 9998 9999]
    

    In NumPy dimensions are called axes.

    ndarray.ndim

    Basic Operations

    1. 算术操作 (Arithmetic operators on arrays apply elementwise.)
    • ,-,*,/,四则运算
    • ** 乘方运算
    • >,<, == 逻辑运算,返回布尔数组

    这几种操作都是elementwise operation,都是针对数组元素的操作。

    1. matrix product
    • A.dot(B)
    • np.dot(A, B)
    1. 求和与极值,可以通过axis选择数组的轴
    • A.sum(axis)
    • A.min(axis)
    • A.max(axis)
    >>> b = np.arange(12).reshape(3,4)
    >>> b
    array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
    >>>
    >>> b.sum(axis=0) # sum of each column
    array([12, 15, 18, 21])
    >>>
    >>> b.min(axis=1) # min of each row
    array([0, 4, 8])
    >>>
    >>> b.cumsum(axis=1) # cumulative sum along each row
    array([[ 0, 1, 3, 6], [ 4, 9, 15, 22], [ 8, 17, 27, 38]])
    

    For example, the coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it has a length of 3. In the example pictured below, the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3.

    数组轴的个数,在python的世界中,轴的个数被称作秩

    Universal Functions¶

    NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal functions” (ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.

    Numpy提供了像sin,cos,exp这样类似的数学函数。在Numpy,它们被称作全局函数。这些函数操作数组的所有元素,产生一个输出数组。

    >>> B = np.arange(3)
    >>> Barray([0, 1, 2])
    >>> np.exp(B)
    array([ 1. , 2.71828183, 7.3890561 ])
    >>> np.sqrt(B)
    array([ 0. , 1. , 1.41421356])
    >>> C = np.array([2., -1., 4.])
    >>> np.add(B, C)
    array([ 2., 0., 6.])
    
    [[ 1., 0., 0.],
     [ 0., 1., 2.]]
    
    >> X = np.reshape(np.arange(24), (2, 3, 4))
      # 也即 2 行 3 列的 4 个平面(plane)
    >> X
    array([[[ 0, 1, 2, 3],
        [ 4, 5, 6, 7],
        [ 8, 9, 10, 11]],
        [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
    

    Indexing, Slicing and Iterating 索引,切片和迭代

    1. 一维数组的索引,切片和迭代就像python中常用的sequence一样。

    2. 多维数组每个轴有一个索引, 这些下标用逗号分割的tuple表示,表示方法与matlab相似。省略的下标表示全部索引

    3. 当数组的维度比较大时,可以使用...来省略索引。对于rank=5的数组:

    • x[1,2,...] is equivalent to x[1,2,:,:,:]
    • x[...,3] to x[:,:,:,:,3] and
    • x[4,...,5,:] to x[4,:,:,5,:]
    1. 迭代。多维数组的第一个轴作为迭代轴for row in b: print(row)打印的是b的第一个轴即行。可以用flat属性把多维数组的元素全部展开for element in b.flat: print(element),这样能打印b的所有元素。

    其实,可以这么理解。维度(dimension) D和数组A,D[axis]和A[i] 。是不是大概懂了,axis对应第几维度,与数组的下标的作用差不多。但是axis有点区别的。既然axis是下标那么就有范围:

    shape函数是numpy.core.fromnumeric中的函数,它的功能是读取矩阵的长度,比如shape[0]就是读取矩阵第一维度的长度。

    Shape Manipulation

    1. 改变数组的形状
    >>> a = np.floor(10*np.random.random((3,4)))
    >>> a
    array([[ 2., 8., 0., 6.], 
              [ 4., 5., 1., 1.], 
              [ 8., 9., 3., 6.]])
    >>> a.shape
    (3, 4)
    >>> a.ravel() # flatten the array
    array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
    >>> a.shape = (6, 2)
    >>> a.T
    array([[ 2., 0., 4., 1., 8., 3.], 
              [ 8., 6., 5., 1., 9., 6.]])
    
    1. 把不同的数组堆到一起

    [-维度,维度),如上例子axis的取值范围 [-2,2),记住不包括2。

    shape(x)

    Random sampling (numpy.random)

    1
    1
    11
    1
    1
    1
    1
    1
    11
    1
    1
    1
    11

    维度与axis的对应关系:axis是从最外层的 [] 数起来的,如上的例子,axis=0:第二维,axis=1:第一维。

    (2,3,4)

    二、验证:

    shape(x)[0]

    1 # 产生24个[0,50)的随机整数,维度为3
    2 x = np.random.RandomState(5).randint(50, size=[2, 3, 4])
    3 print(x.ndim, x.shape, x.size)
    4 print("x:n", x)
    

    2

    新葡亰496net 5

    或者

    选一个能够使用到axis的函数:这里选用numpy.amax()(选出最大的元素),

    x.shape[0]

    为了方便理解,先从最内层开始

    2

    1 print("x[0][0]:n", x[0][0])
    2 print("axis=2: n", np.amax(x, 2))
    

    再来分别看每一个平面的构成:

    新葡亰496net 6

    >> X[:, :, 0]
    array([[ 0, 4, 8],
        [12, 16, 20]])
    >> X[:, :, 1]
    array([[ 1, 5, 9],
        [13, 17, 21]])
    >> X[:, :, 2]
    array([[ 2, 6, 10],
        [14, 18, 22]])
    >> X[:, :, 3]
    array([[ 3, 7, 11],
        [15, 19, 23]])
    

     

    也即在对 np.arange(24)(0, 1, 2, 3, ..., 23) 进行重新的排列时,在多维数组的多个轴的方向上,先分配最后一个轴(对于二维数组,即先分配行的方向,对于三维数组即先分配平面的方向)

    print("x[0]:n", x[0])
    print("axis=1: n", np.amax(x, 1))
    

    reshpae,是数组对象中的方法,用于改变数组的形状。

     新葡亰496net 7

    二维数组

    1 print("x:n", x)
    2 print("axis=0: n", np.amax(x, 0))
    
    #!/usr/bin/env python 
    # coding=utf-8 
    import numpy as np 
    
    a=np.array([1, 2, 3, 4, 5, 6, 7, 8]) 
    print a 
    d=a.reshape((2,4)) 
    print d 
    

    新葡亰496net 8

    新葡亰496net 9

    很显然,从axis=2,axis=1都挺好理解,但是axis=0就有点困惑了,而且这个仅仅是三维而已,那么四维、五维呢。

    三维数组

    但是其实仔细观察axis=2的第一个数字54是怎么来的呢?是从x[0][0][0]—x[0][0][4]比较而得。因此一共有3*4个。

    #!/usr/bin/env python 
    # coding=utf-8 
    import numpy as np 
    
    a=np.array([1, 2, 3, 4, 5, 6, 7, 8]) 
    print a 
    f=a.reshape((2, 2, 2)) 
    print f 
    

    同理axis=1时,比较的就是:x[0][0][0]—x[0][3][0],共3*5个

    新葡亰496net 10

    同理axis=2时,比较的是:x[0][0][0]—x[2][0][0],共4*5个

    形状变化的原则是数组元素不能发生改变,比如这样写就是错误的,因为数组元素发生了变化。

    现在,是不是就对不同的axis的输出的形状或者说排列有一定的了解了?而且是不是体会到axis的作用了?我可是烦死那么多方括号了!!

    #!/usr/bin/env python 
    # coding=utf-8 
    import numpy as np 
    
    a=np.array([1, 2, 3, 4, 5, 6, 7, 8]) 
    print a 
    print a.dtype 
    e=a.reshape((2,2)) 
    print e 
    

     三、总结

    新葡亰496net 11

    最直观的:函数所选的axis的值,就表明 x[][][] 的第几个方块号,从0开始,代表第一个[ ],即x[ ] [ ] [ ]

    注意:通过reshape生成的新数组和原始数组公用一个内存,也就是说,假如更改一个数组的元素,另一个数组也将发生改变。

    不足或者错误之处,欢迎指正!

    #!/usr/bin/env python 
    # coding=utf-8 
    import numpy as np 
    
    a=np.array([1, 2, 3, 4, 5, 6, 7, 8]) 
    print a 
    e=a.reshape((2, 4)) 
    print e 
    a[1]=100 
    print a 
    print e 
    

    新葡亰496net 12新葡亰496net 13

    新葡亰496net 14

     1 import numpy as np
     2 
     3 # 产生60个[0,60)的随机整数,维度为3
     4 x = np.random.RandomState(5).randint(60, size=[3, 4, 5])
     5 print(x.ndim, x.shape, x.size)
     6 print("x:n", x)
     7 print("x[0][0]:n", x[0][0])
     8 print("axis=2: n", np.amax(x, 2))
     9 
    10 print("x[0]:n", x[0])
    11 print("axis=1: n", np.amax(x, 1))
    12 
    13 print("x:n", x)
    14 print("axis=0: n", np.amax(x, 0))
    15 print("n", np.amin(x, 0))
    16 for i in range(4):
    17     print(x[0][i][1])
    

    Python中reshape函数参数-1的意思

    全部代码

    a=np.arange(0, 60, 10)
    >>>a
    array([0,10,20,30,40,50])
    >>>a.reshape(-1,1)
    array([[0],
    [10],
    [20],
    [30],
    [40],
    [50]])
    

     

    如果写成a.reshape(1,1)就会报错

    ValueError:cannot reshape array of size 6 into shape (1,1)

    >>> a = np.array([[1,2,3], [4,5,6]])
    >>> np.reshape(a, (3,-1)) # the unspecified value is inferred to be 2
    array([[1, 2],
        [3, 4],
        [5, 6]])
    

    -1表示我懒得计算该填什么数字,由python通过a和其他的值3推测出来。

    # 下面是两张2*3大小的照片(不知道有几张照片用-1代替),如何把所有二维照片给摊平成一维
    >>> image = np.array([[[1,2,3], [4,5,6]], [[1,1,1], [1,1,1]]])
    >>> image.shape
    (2, 2, 3)
    >>> image.reshape((-1, 6))
    array([[1, 2, 3, 4, 5, 6],
        [1, 1, 1, 1, 1, 1]])
    

    以上这篇对numpy中轴与维度的理解就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持脚本之家。

    本文由新葡亰496net发布于奥门新萄京娱乐场,转载请注明出处:numpy的axis的理解和检验,对numpy中轴与维度的理解

    关键词:

上一篇:python进级教程之词典,Python快捷学习06

下一篇:没有了