Interactive Visual Search
Interactive visual search with user feedback helps buyers find the perfect item and while enjoying the exploratory journey.
A machine may be configured to perform image evaluation of images depicting items for sale and to provide recommendations for improving the images depicting the items to increase the sales of the items depicted in the images. For example, the machine accesses a result of a user behavior analysis. The machine receives an image of an item from a user device. The machine performs an image evaluation of the received image based on an analysis of the received image and the result of the user behavior analysis. The performing of the image evaluation may include determining a likelihood of a user engaging in a desired user behavior in relation to the received image. Then, the machine generates, based on the evaluation of the received image, an output that references the received image and indicates the likelihood of a user engaging in the desired behavior.
Camera platform techniques are described. In an implementation, a plurality of digital images and data describing times, at which, the plurality of digital images are captured is received by a computing device. Objects of clothing are recognized from the digital images by the computing device using object recognition as part of machine learning. A user schedule is also received by the computing device that describes user appointments and times, at which, the appointments are scheduled. A user profile is generated by the computing device by training a model using machine learning based on the recognized objects of clothing, times at which corresponding digital images are captured, and the user schedule. From the user profile, a recommendation is generated by processing a subsequent user schedule using the model as part of machine learning by the computing device.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
Disclosed are embodiments for facilitating automatic-guided image capturing and presentation. In some embodiments, the method includes capturing an image of an item, removing a background of the image frame, performing manual mask editing, generating an item listing based on the manually edited mask, inferring item information from the image frame and applying the inferred item information to an item listing.
Disclosed are systems, methods, and computer-readable media for using adversarial learning for fine-grained image search. An image search system receives a search query that includes an input image depicting an object. The search system generates, using a generator, a vector representation of the object in a normalized view. The generator was trained based on a set of reference images of known objects in multiple views, and feedback data received from an evaluator that indicates performance of the generator at generating vector representations of the known objects in the normalized view. The evaluator including a discriminator sub-module, a normalizer sub-module, and a semantic embedding sub-module that generate the feedback data. The image search system identifies, based on the vector representation of the object, a set of other images depicting the object, and returns at least one of the other images in response to the search query.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And a dominant color of the image is identified.
The disclosed technologies include a robotic selling assistant that receives an item from a seller, automatically generates a posting describing the item for sale, stores the item until it is sold, and delivers or sends the item out for delivery. The item is placed in a compartment that uses one or more sensors to identify the item, retrieve supplemental information about the item, and take pictures of the item for inclusion in the posting. A seller-supplied description of the item may be verified based on the retrieved supplemental information, preventing mislabeled items from being sold.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
Various embodiments described herein utilize multiple levels of generative adversarial networks (GANs) to facilitate generation of digital images based on user-provided images. Some embodiments comprise a first generative adversarial network (GAN) and a second GAN coupled to the first GAN, where the first GAN includes an image generator and at least two discriminators, and the second GAN includes an image generator and at least one discriminator. According to some embodiments, the (first) image generator of the first GAN is trained by processing a user-provided image using the first GAN. For some embodiments, the user-provided image and the first generated image, generated by processing the user-provided image using the first GAN, are combined to produce a combined image. For some embodiments, the (second) image generator of the second GAN is trained by processing the combined image using the second GAN.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
A machine may be configured to determine fashion preferences of users and to provide item recommendations to the users based on the users? fashion preferences. For example, the machine receives an image of a user and a set of spatial data indicating a position of the body of the user in a three-dimensional space. The machine may perform an analysis of the image and the set of spatial data. The performing of the analysis may include extracting, from the image, an image swatch that depicts a portion of an item worn by the user. The machine may identify a fashion preference of the user based on the analysis of the image and of the set of spatial data. The machine may identify an item that corresponds to the fashion preference of the user within an inventory of fashion items and may generate a recommendation of the identified fashion item.
In various example embodiments, a system and method for a Listing Engine that translates a first listing from a first language to a second language. The first listing includes an image(s) of a first item. The Listing Engine provides as input to an encoded neural network model a portion(s) of a translated first listing and a portions(s) of a second listing in the second language. The second listing includes an image(s) of a second item. The Listing Engine receives from the encoded neural network model a first feature vector for the translated first listing and a second feature vector for the second listing. The first and the second feature vectors both include at least one type of image signature feature and at least one type of listing text-based feature. Based on a similarity score of the first and second feature vectors at least meeting a similarity score threshold, the Listing Engine generates a pairing of the first listing in the first language with the second listing in the second language for inclusion in training data of a machine translation system.
Systems, methods, and computer program products for identifying a relevant candidate product in an electronic marketplace. Embodiments perform a visual similarity comparison between candidate product image visual content and input query image visual content, process formal and informal natural language user inputs, and coordinate aggregated past user interactions with the marketplace stored in a knowledge graph. Visually similar items and their corresponding product categories, aspects, and aspect values can determine suggested candidate products without discernible delay during a multi-turn user dialog. The user can then refine the search for the most relevant items available for purchase by providing responses to machine-generated prompts that are based on the initial search results from visual, voice, and/or text inputs. An intelligent online personal assistant can thus guide a user to the most relevant candidate product more efficiently than existing search tools.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
Systems, methods, and computer program products for identifying a candidate product in an electronic marketplace based on a visual comparison between candidate product image visual content and input query image visual content. Embodiments generate and store descriptive image signatures from candidate product images or selected portions of such images. A subsequently calculated visual similarity measure serves as a visual search result score for the candidate product in comparison to an input query image. Any number of images of any number of candidate products may be analyzed, such as for items available for sale in an online marketplace. Image analysis results are stored in a database and made available for subsequent automated on-demand visual comparisons to an input query image. The embodiments enable substantially real time visual based product searching of a potentially vast catalog of items.
Camera platform and object inventory control techniques are described. In an implementation a live feed of digital images is output in a user interface by a computing device. A user selection is received through interaction with the user interface of at least one of the digital images. An object, included within the at least one digital image, is recognized using machine learning. Metadata is then obtained that pertains to the recognized object. Augmented reality digital content is generated based at least in part of the obtained metadata. The augmented reality digital content is displayed as part of the live feed of digital images as associated with the object.
Various embodiments use a neural network to analyze images for aspects that characterize the images, to present locations of those aspects on the images, and, additionally, to permit a user to interact with those locations on the images. For example, a user may interact with a visual cue over one of those locations to modify, refine, or filter the results of a visual search, performed on a publication corpus, that uses an input image (e.g., one captured using a mobile device) as a search query.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
Systems, methods, and computer program products for identifying a candidate product in an electronic marketplace based on a visual comparison between candidate product image visual content and input query image visual content. Embodiments generate and store descriptive image signatures from candidate product images or selected portions of such images. A subsequently calculated visual similarity measure serves as a visual search result score for the candidate product in comparison to an input query image. Any number of images of any number of candidate products may be analyzed, such as for items available for sale in an online marketplace. Image analysis results are stored in a database and made available for subsequent automated on-demand visual comparisons to an input query image. The embodiments enable substantially real time visual based product searching of a potentially vast catalog of items.
In various example embodiments, a system and method for integration of a three-dimensional model is disclosed. In one example embodiment, a method includes receiving a plurality of images, selecting points on the images and triangulating the points to generate a plurality of depth maps, generate a three-dimensional mesh by combining the plurality of depth maps, generating a three-dimensional model of the item by projecting the plurality of images onto the mesh using the points, calibrating colors used in the model using colors diffuse properties of the colors in the images, and providing a user interface allowing a user to select one or more user points on the three-dimensional model and provide additional information associated with the selected user points.
A system receives image data associated with an item, where the image data comprising a view of the item from two or more angles; determines physical attributes of the item; generates a base model of the item; samples the base model to generate one or more sampled models, each of the one or more sampled models comprising a subset of the geometric data, the subset of the geometric data determined based on one or more device characteristics of one or more user devices that interface with the system; receives device characteristics of a user device associated with a request from the user device for the item; selects, based on the received device characteristics, a sampled model of the item; and transmits a data object comprising the selected sampled model to the user device to cause the user device to generate a three-dimensional rendering of the item.
In various example embodiments, a system and method for integration of a three-dimensional model is disclosed. In one example embodiment, a method includes receiving a plurality of images, selecting points on the images and triangulating the points to generate a plurality of depth maps, generate a three-dimensional mesh by combining the plurality of depth maps, generating a three-dimensional model of the item by projecting the plurality of images onto the mesh using the points, calibrating colors used in the model using colors diffuse properties of the colors in the images, and providing a user interface allowing a user to select one or more user points on the three-dimensional model and provide additional information associated with the selected user points.
Example embodiments that analyze images to characterize aspects of the images rely on a same neural network to characterize multiple aspects in parallel. Because additional neural networks are not required for additional aspects, such an approach scales with increased aspects.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
An apparatus and method to adjust item recommendations are disclosed herein. A first image attribute of a query image is compared to a second image attribute of each of a plurality of inventory images of a plurality of inventory items to identify the inventory items similar to the query image. Item recommendations comprising the identified inventory items in a first listing order are provided for display at a remote device. A second listing order of the identified inventory items is determined based on a user preference for a particular one of the identified inventory items. At least the second listing order is provided to the remote device for re-display of the item recommendations in accordance with the second listing order.
Camera platform techniques are described. In an implementation, a plurality of digital images and data describing times, at which, the plurality of digital images are captured is received by a computing device. Objects of clothing are recognized from the digital images by the computing device using object recognition as part of machine learning. A user schedule is also received by the computing device that describes user appointments and times, at which, the appointments are scheduled. A user profile is generated by the computing device by training a model using machine learning based on the recognized objects of clothing, times at which corresponding digital images are captured, and the user schedule. From the user profile, a recommendation is generated by processing a subsequent user schedule using the model as part of machine learning by the computing device.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
A machine may be configured to execute a machine-learning process for identifying and understanding fine properties of various items of various types by using images and associated corresponding annotations, such as titles, captions, tags, keywords, or other textual information applied to these images. By use of a machine-learning process, the machine may perform property identification accurately and without human intervention. These item properties may be used as annotations for other images that have similar features. Accordingly, the machine may answer user-submitted questions, such as ?What do rustic items look like?,? and items or images depicting items that are deemed to be rustic can be readily identified, classified, ranked, or any suitable combination thereof.
Camera platform techniques are described. In an implementation, a plurality of digital images and data describing times, at which, the plurality of digital images are captured is received by a computing device. Objects of clothing are recognized from the digital images by the computing device using object recognition as part of machine learning. A user schedule is also received by the computing device that describes user appointments and times, at which, the appointments are scheduled. A user profile is generated by the computing device by training a model using machine learning based on the recognized objects of clothing, times at which corresponding digital images are captured, and the user schedule. From the user profile, a recommendation is generated by processing a subsequent user schedule using the model as part of machine learning by the computing device.
Techniques and systems are described that leverage computer vision as part of search to expand functionality of a computing device available to a user and increase operational computational efficiency as well as efficiency in user interaction. In a first example, user interaction with items of digital content is monitored. Computer vision techniques are used to identify digital images in the digital content, objects within the digital images, and characteristics of those objects. This information is used to assign a user to a user segment of a user population which is then used to control output of subsequent digital content to the user, e.g., recommendations, digital marketing content, and so forth.
Methods for face detection to address privacy in publishing image datasets is described. A method may include face classification in an online marketplace. A server system may receive, from a seller user device, a listing including an image for the online marketplace. The server system may classify, by at least one processor that implement a distribution-balance trained machine learning model, each human face candidate within the image as being one of a private human face or a non-private human face. The server system may receive, from a buyer user device, a search query that is mapped to the listing in the online marketplace. The server system may transmit, to the buyer user device, a query response including the listing that includes the image determined to not include any private human faces or obscures any private human faces within the image based on the classifying.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And a dominant color of the image is identified.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
A machine is configured to determine fashion preferences of users and to provide item recommendations based on the fashion preferences. For example, the machine accesses an indication of a fashion style of a user. The fashion style is determined based on automatically captured data pertaining to the user. The machine identifies, based on the fashion style, one or more fashion items from an inventory of fashion items. The machine generates one or more selectable user interface elements for inclusion in a user interface. The one or more user interface elements correspond to the one or more fashion items. The machine causes generation and display of the user interface that includes the one or more selectable user interface elements. A selection of a selectable user interface element results in display of a combination of an image of a particular fashion item and an image of an item worn by the user.
Systems and methods to fit an image of an inventory part are described. In one aspect, a method includes receiving images of items over a computer network from a server, capturing a live video image of an object using a camera, playing the live video of the object on an electronic display, and continually refitting an image of a first item from the received images of items to the object as the object changes perspective in the video by applying an affine transformation to the image of the first item
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
In various example embodiments, a system and method for a Listing Engine that translates a first listing from a first language to a second language. The first listing includes an image(s) of a first item. The Listing Engine provides as input to an encoded neural network model a portion(s) of a translated first listing and a portions(s) of a second listing in the second language. The second listing includes an image(s) of a second item. The Listing Engine receives from the encoded neural network model a first feature vector for the translated first listing and a second feature vector for the second listing. The first and the second feature vectors both include at least one type of image signature feature and at least one type of listing text-based feature. Based on a similarity score of the first and second feature vectors at least meeting a similarity score threshold, the Listing Engine generates a pairing of the first listing in the first language with the second listing in the second language for inclusion in training data of a machine translation system.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
Electronic content that has a tactile dimension when presented on a tactile-enabled computing device may be referred to as tactile-enabled content. A tactile-enabled device is a device that is capable of presenting tactile-enabled content in a manner that permits a user to experience tactile quality of electronic content. In one example embodiment, a system is provided for generating content that has a tactile dimension when presented on a tactile-enabled device.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
A machine may be configured to perform image evaluation of images depicting items for sale and to provide recommendations for improving the images depicting the items to increase the sales of the items depicted in the images. For example, the machine accesses a result of a user behavior analysis. The machine receives an image of an item from a user device. The machine performs an image evaluation of the received image based on an analysis of the received image and the result of the user behavior analysis. The performing of the image evaluation may include determining a likelihood of a user engaging in a desired user behavior in relation to the received image. Then, the machine generates, based on the evaluation of the received image, an output that references the received image and indicates the likelihood of a user engaging in the desired behavior.
A large synthetic 3D human body model dataset using real-world body size distributions is created. The model dataset may follow real-world body parameter distributions. Depth sensors can be integrated into mobile devices such as tablets, cellphones, and wearable devices. Body measurements for a user are extracted from a single frontal-view depth map using joint location information. Estimates of body measurements are combined with local geometry features around joint locations to form a robust multi-dimensional feature vector. A fast nearest-neighbor search is performed using the feature vector for the user and the feature vectors for the synthetic models to identify the closest match. The retrieved model can be used in various applications such as clothes shopping, virtual reality, online gaming, and others.
Methods, systems, and computer programs are presented for adding new features to a network service. An example method includes accessing an image from a user device to determine a salient object count of a plurality of objects in the image. A salient object count of the plurality of objects in the image is determined. An indicator of the salient object count of the plurality of objects in the image is caused to be displayed on the user device.
A system comprising a computer-readable medium carrying at least one program and a computer-implemented method for facilitating automatic-guided image capturing and presentation are presented. In some embodiments, the method includes capturing an image of an item, removing automatically a background of the image frame, performing manual mask editing, generating an item listing, inferring item information from the image frame and automatically applying the inferred item information to an item listing form, and presenting an item listing in an augmented reality environment.
Disclosed are methods and systems for displaying items of clothing on a model having a similar body shape to that of an ecommerce user. In one aspects, a system includes one or more hardware processors configured to perform operations comprising receiving, by one or more hardware processors, an image, the image representing a user height, user weight, and user gender, causing display, by the one or more hardware processors, of a second image via a computer interface, the second image representing a model, the model selected based on a comparison of a model height, weight, and gender with the user height, weight, and gender respectively, receiving, by the one or more hardware processors, a selection of an item of clothing; and causing display, by the one or more hardware processors, of a representation of the selected model wearing the selected item of clothing.
A machine may be configured to determine fashion preferences of users and to provide item recommendations to the users based on the users' fashion preferences. For example, the machine receives an image of a user and a set of spatial data indicating a position of the body of the user in a three-dimensional space. The machine may perform an analysis of the image and the set of spatial data. The performing of the analysis may include extracting, from the image, an image swatch that depicts a portion of an item worn by the user. The machine may identify a fashion preference of the user based on the analysis of the image and of the set of spatial data. The machine may identify an item that corresponds to the fashion preference of the user within an inventory of fashion items and may generate a recommendation of the identified fashion item.
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
Disclosed are systems, methods, and non-transitory computer-readable media for using adversarial learning for fine-grained image search. An image search system receives a search query that includes an input image depicting an object. The search system generates, using a generator, a vector representation of the object in a normalized view. The generator was trained based on a set of reference images of known objects in multiple views, and feedback data received from an evaluator that indicates performance of the generator at generating vector representations of the known objects in the normalized view. The evaluator including a discriminator sub-module, a normalizer sub-module, and a semantic embedding sub-module that generate the feedback data. The image search system identifies, based on the vector representation of the object, a set of other images depicting the object, and returns at least one of the other images in response to the search query.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
In various example embodiments, a system and method for projecting visual aspects into a vector space are presented. A query that includes visual data is received by the system from a client device. A visual aspect indicated in the visual data is analyzed. One or more symbols that correspond to the analyzed visual aspect is generated by the system. The analyzed visual aspect is projected into a vector space using the one or more symbols. A group of projections are identified, the group of projections being within a predetermined distance from the projected visual aspect in the vector space. An interface that depicts the further visual aspects is generated. The interface is displayed on the client device.
Disclosed are embodiments for facilitating automatic-guided image capturing and presentation. In some embodiments, the method includes capturing an image of an item, removing a background of the image frame, performing manual mask editing, generating an item listing based on the manually edited mask, inferring item information from the image frame and applying the inferred item information to an item listing.
Computer vision and image characteristic search is described. The described system leverages visual search techniques by determining visual characteristics of objects depicted in images and comparing the determined characteristics to visual characteristics of other images, e.g., to identify similar visual characteristics in the other images. In some aspects, the described system performs searches that leverage a digital image as part of a search query to locate digital content of interest. In some aspects, the described system surfaces multiple user interface instrumentalities that include images of patterns, textures, or materials and that are selectable to initiate a visual search of digital content having a similar pattern, texture, or material. The described aspects also include pattern-based authentication in which the system determines authenticity of an item in an image based on a similarity of its visual characteristics to visual characteristics of known authentic items.
Various embodiments described herein utilize multiple levels of generative adversarial networks (GANs) to facilitate generation of digital images based on user-provided images. Some embodiments comprise a first generative adversarial network (GAN) and a second GAN coupled to the first GAN, where the first GAN includes an image generator and at least two discriminators, and the second GAN includes an image generator and at least one discriminator. According to some embodiments, the (first) image generator of the first GAN is trained by processing a user-provided image using the first GAN. For some embodiments, the user-provided image and the first generated image, generated by processing the user-provided image using the first GAN, are combined to produce a combined image. For some embodiments, the (second) image generator of the second GAN is trained by processing the combined image using the second GAN.
A large synthetic 3D human body model dataset using real-world body size distributions is created. The model dataset may follow real-world body parameter distributions. Depth sensors can be integrated into mobile devices such as tablets, cellphones, and wearable devices. Body measurements for a user are extracted from a single frontal-view depth map using joint location information. Estimates of body measurements are combined with local geometry features around joint locations to form a robust multi-dimensional feature vector. A fast nearest-neighbor search is performed using the feature vector for the user and the feature vectors for the synthetic models to identify the closest match. The retrieved model can be used in various applications such as clothes shopping, virtual reality, online gaming, and others.
In various example embodiments, a system and method for integration of a three-dimensional model is disclosed. In one example embodiment, a method includes receiving a plurality of images, selecting points on the images and triangulating the points to generate a plurality of depth maps, generate a three-dimensional mesh by combining the plurality of depth maps, generating a three-dimensional model of the item by projecting the plurality of images onto the mesh using the points, calibrating colors used in the model using colors diffuse properties of the colors in the images, and providing a user interface allowing a user to select one or more user points on the three-dimensional model and provide additional information associated with the selected user points.
Computer vision for unsuccessful queries and iterative search is described. The described system leverages visual search techniques by determining visual characteristics of objects depicted in images and describing them, e.g., using feature vectors. In some aspects, these visual characteristics are determined for search queries that are identified as not being successful. Aggregated information describing visual characteristics of images of unsuccessful search queries is used to determine common visual characteristics and objects depicted in those images. This information can be used to inform other users about unmet needs of searching users. In some aspects, these visual characteristics are used in connection with iterative image searches where users select an initial query image and then the search results are iteratively refined. Unlike conventional techniques, the described system iteratively refines the returned search results using an embedding space learned from binary attribute labels describing images.
Example embodiments that analyze images to characterize aspects of the images rely on a same neural network to characterize multiple aspects in parallel. Because additional neural networks are not required for additional aspects, such an approach scales with increased aspects.
Camera platform techniques are described. In an implementation, a plurality of digital images and data describing times, at which, the plurality of digital images are captured is received by a computing device. Objects of clothing are recognized from the digital images by the computing device using object recognition as part of machine learning. A user schedule is also received by the computing device that describes user appointments and times, at which, the appointments are scheduled. A user profile is generated by the computing device by training a model using machine learning based on the recognized objects of clothing, times at which corresponding digital images are captured, and the user schedule. From the user profile, a recommendation is generated by processing a subsequent user schedule using the model as part of machine learning by the computing device.
For an input image of a person, a set of object proposals are generated in the form of bounding boxes. A pose detector identifies coordinates in the image corresponding to locations on the person?s body, such as the waist, head, hands, and feet of the person. A convolutional neural network receives the portions of the input image defined by the bounding boxes and generates a feature vector for each image portion. The feature vectors are input to one or more support vector machine classifiers, which generate an output representing a probability of a match with an item. The distance between the bounding box and a joint associated with the item is used to modify the probability. The modified probabilities for the support vector machine are then compared with a threshold and each other to identify the item.
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
The disclosed technologies include a robotic selling assistant that receives an item from a seller, automatically generates a posting describing the item for sale, stores the item until it is sold, and delivers or sends the item out for delivery. The item is placed in a compartment that uses one or more sensors to identify the item, retrieve supplemental information about the item, and take pictures of the item for inclusion in the posting. A seller-supplied description of the item may be verified based on the retrieved supplemental information, preventing mislabeled items from being sold.
Systems, methods, and computer program products for identifying a relevant candidate product in an electronic marketplace. Embodiments perform a visual similarity comparison between candidate product image visual content and input query image visual content, process formal and informal natural language user inputs, and coordinate aggregated past user interactions with the marketplace stored in a knowledge graph. Visually similar items and their corresponding product categories, aspects, and aspect values can determine suggested candidate products without discernible delay during a multi-turn user dialog. The user can then refine the search for the most relevant items available for purchase by providing responses to machine-generated prompts that are based on the initial search results from visual, voice, and/or text inputs. An intelligent online personal assistant can thus guide a user to the most relevant candidate product more efficiently than existing search tools.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication.The method causes presentation of the ranked list of publications at a computing device from which the image was received.
Systems, methods, and computer program products for identifying a candidate product in an electronic marketplace based on a visual comparison between candidate product image visual text content and input query image visual text content. Unlike conventional optical character recognition (OCR) based systems, embodiments automatically localize and isolate portions of a candidate product image and an input query image that each contain visual text content, and calculate a visual similarity measure between the respective portions. A trained neural network may be re-trained to more effectively find visual text content by using the localized and isolated visual text content portions as additional ground truths. The visual similarity measure serves as a visual search result score for the candidate product. Any number of images of any number of candidate products may be compared to an input query image to enable text-in-image based product searching without resorting to conventional OCR techniques.
Techniques and systems are described that leverage computer vision as part of search to expand functionality of a computing device available to a user and increase operational computational efficiency as well as efficiency in user interaction. In a first example, user interaction with items of digital content is monitored. Computer vision techniques are used to identify digital images in the digital content, objects within the digital images, and characteristics of those objects. This information is used to assign a user to a user segment of a user population which is then used to control output of subsequent digital content to the user, e.g., recommendations, digital marketing content, and so forth.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
Camera platform and object inventory control techniques are described. In an implementation a live feed of digital images is output in a user interface by a computing device. A user selection is received through interaction with the user interface of at least one of the digital images. An object, included within the at least one digital image, is recognized using machine learning. Metadata is then obtained that pertains to the recognized object. Augmented reality digital content is generated based at least in part of the obtained metadata. The augmented reality digital content is displayed as part of the live feed of digital images as associated with the object.
Various embodiments use a neural network to analyze images for aspects that characterize the images, to present locations of those aspects on the images, and, additionally, to permit a user to interact with those locations on the images. For example, a user may interact with a visual cue over one of those locations to modify, refine, or filter the results of a visual search, performed on a publication corpus, that uses an input image (e.g., one captured using a mobile device) as a search query.
Electronic content that has a tactile dimension when presented on a tactile-enabled computing device may be referred to as tactile-enabled content. A tactile-enabled device is a device that is capable of presenting tactile-enabled content in a manner that permits a user to experience tactile quality of electronic content. In one example embodiment, a system is provided for generating content that has a tactile dimension when presented on a tactile-enabled device.
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
A machine is configured to determine fashion preferences of users and to provide item recommendations based on the fashion preferences. For example, the machine accesses an indication of a fashion style of a user. The fashion style is determined based on automatically captured data pertaining to the user. The machine identifies, based on the fashion style, one or more fashion items from an inventory of fashion items. The machine generates one or more selectable user interface elements for inclusion in a user interface. The one or more user interface elements correspond to the one or more fashion items. The machine causes generation and display of the user interface that includes the one or more selectable user interface elements. A selection of a selectable user interface element results in display of a combination of an image of a particular fashion item and an image of an item worn by the user.
Methods, systems, and computer programs are presented for adding new features to a network service. A method includes receiving an image depicting an object of interest. A category set is determined for the object of interest and an image signature is generated for the image. Using the category set and the image signature, the method identifies a set of publications within a publication database and assigns a rank to each publication. The method causes presentation of the ranked list of publications at a computing device from which the image was received.
A machine may be configured to determine fashion preferences of users and to provide item recommendations to the users based on the users' fashion preferences. For example, the machine receives an image of a user and a set of spatial data indicating a position of the body of the user in a three-dimensional space. The machine may perform an analysis of the image and the set of spatial data. The performing of the analysis may include extracting, from the image, an image swatch that depicts a portion of an item worn by the user. The machine may identify a fashion preference of the user based on the analysis of the image and of the set of spatial data. The machine may identify an item that corresponds to the fashion preference of the user within an inventory of fashion items and may generate a recommendation of the identified fashion item.
A machine may be configured to execute a machine-learning process for identifying and understanding fine properties of various items of various types by using images and associated corresponding annotations, such as titles, captions, tags, keywords, or other textual information applied to these images. By use of a machine-learning process, the machine may perform property identification accurately and without human intervention. These item properties may be used as annotations for other images that have similar features. Accordingly, the machine may answer user-submitted questions, such as ?What do rustic items look like?,? and items or images depicting items that are deemed to be rustic can be readily identified, classified, ranked, or any suitable combination thereof.
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
Disclosed are embodiments for facilitating automatic-guided image capturing and presentation. In some embodiments, the method includes capturing an image of an item, removing a background of the image frame, performing manual mask editing, generating an item listing based on the manually edited mask, inferring item information from the image frame and applying the inferred item information to an item listing.
An apparatus and method to adjust item recommendations are disclosed herein. A first image attribute of a query image is compared to a second image attribute of each of a plurality of inventory images of a plurality of inventory items to identify the inventory items similar to the query image. Item recommendations comprising the identified inventory items in a first listing order are provided for display at a remote device. A second listing order of the identified inventory items is determined based on a user preference for a particular one of the identified inventory items. At least the second listing order is provided to the remote device for re-display of the item recommendations in accordance with the second listing order.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
A system comprising a computer-readable medium carrying at least one program and a computer-implemented method for facilitating automatic-guided image capturing and presentation are presented. In some embodiments, the method includes capturing an image of an item, removing automatically a background of the image frame, performing manual mask editing, generating an item listing, inferring item information from the image frame and automatically applying the inferred item information to an item listing form, and presenting an item listing in an augmented reality environment.
A method, system, and article of manufacture for recommending items for a room. An image of a room is received, a box image is fitted to the image of the room. Information is extracted from the fitted box image and is used for recommending items for the room. The image is a color image and extracting information is done by extracting color histograms from the fitted box image. The color histograms are used to determine items that match the color scheme of the room, the lighting of the room, and/or the decorating style of the room.
In various example embodiments, a system and method for projecting visual aspects into a vector space are presented. A query that includes visual data is received by the system from a client device. A visual aspect indicated in the visual data is analyzed. One or more symbols that correspond to the analyzed visual aspect is generated by the system. The analyzed visual aspect is projected into a vector space using the one or more symbols. A group of projections are identified, the group of projections being within a predetermined distance from the projected visual aspect in the vector space. An interface that depicts the further visual aspects is generated. The interface is displayed on the client device.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
In various example embodiments, a system and method for integration of a three-dimensional model is disclosed. In one example embodiment, a method includes receiving a plurality of images, selecting points on the images and triangulating the points to generate a plurality of depth maps, generate a threedimensional mesh by combining the plurality of depth maps, generating a threedimensional model of the item by projecting the plurality of images onto the mesh using the points, calibrating colors used in the model using colors diffuse properties of the colors in the images, and providing a user interface allowing a user to select one or more user points on the three-dimensional model and provide additional information associated with the selected user points.
A system comprising a computer-readable medium carrying at least one program and a computer-implemented method for facilitating automatic-guided image capturing and presentation are presented. In some embodiments, the method includes capturing an image of an item, removing automatically a background of the image frame, performing manual mask editing, generating an item listing, inferring item information from the image frame and automatically applying the inferred item information to an item listing form, and presenting an item listing in an augmented reality environment.
Various embodiments described herein utilize multiple levels of generative adversarial networks (GANs) to facilitate generation of digital images based on user-provided images. Some embodiments comprise a first generative adversarial network (GAN) and a second GAN coupled to the first GAN, where the first GAN includes an image generator and at least two discriminators, and the second GAN includes an image generator and at least one discriminator. According to some embodiments, the (first) image generator of the first GAN is trained by processing a user-provided image using the first GAN. For some embodiments, the user-provided image and the first generated image, generated by processing the user-provided image using the first GAN, are combined to produce a combined image. For some embodiments, the (second) image generator of the second GAN is trained by processing the combined image using the second GAN
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
Methods, systems, and computer programs are presented for adding new features to a network service. An example method includes accessing an image from a user device to determine a salient object count of a plurality of objects in the image. A salient object count of the plurality of objects in the image is determined. An indicator of the salient object count of the plurality of objects in the image is caused to be displayed on the user device.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
Camera platform techniques are described. In an implementation, a plurality of digital images and data describing times, at which, the plurality of digital images are captured is received by a computing device. Objects of clothing are recognized from the digital images by the computing device using object recognition as part of machine learning. A user schedule is also received by the computing device that describes user appointments and times, at which, the appointments are scheduled. A user profile is generated by the computing device by training a model using machine learning based on the recognized objects of clothing, times at which corresponding digital images are captured, and the user schedule. From the user profile, a recommendation is generated by processing a subsequent user schedule using the model as part of machine learning by the computing device.
A machine may be configured to determine fashion preferences of users and to provide item recommendations to the users based on the users' fashion preferences. For example, the machine receives an image of a user and a set of spatial data indicating a position of the body of the user in a three-dimensional space. The machine may perform an analysis of the image and the set of spatial data. The performing of the analysis may include extracting, from the image, an image swatch that depicts a portion of an item worn by the user. The machine may identify a fashion preference of the user based on the analysis of the image and of the set of spatial data. The machine may identify an item that corresponds to the fashion preference of the user within an inventory of fashion items and may generate a recommendation of the identified fashion item.
An apparatus and method to facilitate providing recommendations are disclosed herein. A
networked system receives a query image, and performs image processing to extract a color histogram of an item depicted in the query image. The networked system identifies a dominant color of the item, whereby the dominant color is a color that is present on a most spatial area of the item relative to any other color of the item. The color histogram of the item is compared with a color histogram of an image of each of a plurality of available items, or the dominate color of the item is compared with a dominate color of the image of each of a plurality of available items. Based on the comparing, the networked system identifies item recommendations of the available items that closely match the item depicted in the query image, and the item recommendations are presented.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
Products (e.g., books) often include a significant amount of informative textual information that can be used in identifying the item. An input query image is a photo (e.g., a picture taken using a mobile phone) of a product. The photo is taken from an arbitrary angle and orientation, and includes an arbitrary background (e.g., a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from a database. For example, the database may be a product database having a name of the product, image of the product, price of the product, sales history for the product, or any suitable combination thereof. The retrieval is performed by both matching the image with the images in the database and matching text retrieved from the image with the text in the database.
Electronic content that has a tactile dimension when presented on a tactile-enabled computing device may be referred to as tactile-enabled content. A tactile-enabled device is a device that is capable of presenting tactile-enabled content in a manner that permits a user to experience tactile quality of electronic content. In one example embodiment, a system is provided for generating content that has a tactile dimension when presented on a tactile-enabled device.
Hierarchical branching deep convolutional neural networks (HD-CNNs) improve existing convolutional neural network (CNN) technology. In a HD-CNN, classes that can be easily distinguished are classified in a higher layer coarse category CNN, while the most difficult classifications are done on lower layer fine category CNNs. Multinomial logistic loss and a novel temporal sparsity penalty may be used in HD- CNN training. The use of multinomial logistic loss and a temporal sparsity penalty causes each branching component to deal with distinct subsets of categories.
A method, system, and article of manufacture for recommending items for a room. An image of a room is received, a box image is fitted to the image of the room. Information is extracted from the fitted box image and is used for recommending items for the room. The image is a color image and extracting information is done by extracting color histograms from the fitted box image. The color histograms are used to determine items that match the color scheme of the room, the lighting of the room, and/or the decorating style of the room.
In various example embodiments, a system and method for a Listing Engine that translates a first listing from a first language to a second language. The first listing includes an image(s) of a first item. The Listing Engine provides as input to an encoded neural network model a portion(s) of a translated first listing and a portions(s) of a second listing in the second language. The second listing includes an image(s) of a second item. The Listing Engine receives from the encoded neural network model a first feature vector for the translated first listing and a second feature vector for the second listing. The first and the second feature vectors both include at least one type of image signature feature and at least one type of listing text-based feature. Based on a similarity score of the first and second feature vectors at least meeting a similarity score threshold, the Listing Engine generates a pairing of the first listing in the first language with the second listing in the second language for inclusion in training data of a machine translation system.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
A system comprising a computer-readable storage medium storing at least one program and a computer-implemented method for facilitating automatic-guided image capturing and presentation are presented. In some embodiments, the method includes capturing an image of an item, removing automatically a background of the image frame, performing manual mask editing, generating an item listing, inferring item information from the image frame and automatically applying the inferred item information to an item listing form, and presenting an item listing in an augmented reality environment.
A machine may be configured to perform image evaluation of images depicting items for sale and to provide recommendations for improving the images depicting the items to increase the sales of the items depicted in the images. For example, the machine accesses a result of a user behavior analysis. The machine receives an image of an item from a user device. The machine performs an image evaluation of the received image based on an analysis of the received image and the result of the user behavior analysis. The performing of the image evaluation may include determining a likelihood of a user engaging in a desired user behavior in relation to the received image. Then, the machine generates, based on the evaluation of the received image, an output that references the received image and indicates the likelihood of a user engaging in the desired behavior.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
In various example embodiments, a system and method are provided for automated estimation of a saliency map for an image based on a graph structure comprising nodes corresponding to respective superpixels on the image, the graph structure including boundary-connecting nodes that connects each non-boundary node to one or more boundary regions. Each non-boundary node is in some embodiments connected to all boundary nodes by respective boundary-connecting edges forming part of the graph. Edge weights are calculated to generate a weighted graph. Saliency map estimation comprises bringing respective nodes for similarity to a background query. The edge weights of at least some of the edges are in some embodiments calculated as a function of a geodesic distance or shortest path between the corresponding nodes.
An apparatus and method to adjust item recommendations are disclosed herein. A first image attribute of a query image is compared to a second image attribute of each of a plurality of inventory images of a plurality of inventory items to identify the inventory items similar to the query image. Item recommendations comprising the identified inventory items in a first listing order are provided for display at a remote device. A second listing order of the identified inventory items is determined based on a user preference for a particular one of the identified inventory items. At least the second listing order is provided to the remote device for re-display of the item recommendations in accordance with the second listing order.
A machine may be configured to perform image evaluation of images depicting items for online publishing. For example, the machine performing a user behavior analysis based on data pertaining to interactions by a plurality of users with a plurality of images pertaining to a particular type of item. The machine determines, based on the user behavior analysis, that a presentation type associated with one or more images of the plurality of images corresponds to a user behavior in relation to the one or more images. The machine determines that an item included in a received image is of the particular type of item. The machine generates an output for display in a client device. The output includes a reference to the received image and a recommendation of the presentation type for the item included in the received image, for publication by a web server of a publication system.
In various example embodiments, a system and method for determining an item that has confirmed characteristics are described herein. An image that depicts an object is received from a client device. Structured data that corresponds to characteristics of one or more items are retrieved. A set of characteristics is determined, the set of characteristics being predicted to match with the object. An interface that includes a request for confirmation of the set of characteristics is generated. The interface is displayed on the client device. Confirmation that at least one characteristic from the set of characteristics matches with the object depicted in the image is received from the client device.
Systems and methods to fit an image of an inventory part are described. The system receives, over a network, a selection that identifies a part type and further receives, over the network, an image of a vehicle. The system automatically identifies an image of a first inventory part based on the selection of the part type and the image of the vehicle. The system automatically positions two boundaries of a rectangle over the image of the vehicle based on the part type, the rectangle including an image of a first vehicle part. The system fits the image of the first inventory part over the image of the first vehicle part based on the rectangle. The system communicates, over the network, a user interface including the image of the vehicle including the image of the first inventory part fitted over the image of the first vehicle part.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
A machine may be configured to determine fashion preferences of users and to provide item recommendations to the users based on the users? fashion preferences. For example, the machine receives an image of a user and a set of spatial data indicating a position of the body of the user in a three-dimensional space. The machine may perform an analysis of the image and the set of spatial data. The performing of the analysis may include extracting, from the image, an image swatch that depicts a portion of an item worn by the user. The machine may identify a fashion preference of the user based on the analysis of the image and of the set of spatial data. The machine may identify an item that corresponds to the fashion preference of the user within an inventory of fashion items and may generate a recommendation of the identified fashion item.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
A machine may be configured to perform image evaluation of images depicting items for sale and to provide recommendations for improving the images depicting the items to increase the sales of the items depicted in the images. For example, the machine accesses a result of a user behavior analysis. The machine receives an image of an item from a user device. The machine performs an image evaluation of the received image based on an analysis of the received image and the result of the user behavior analysis. The performing of the image evaluation may include determining a likelihood of a user engaging in a desired user behavior in relation to the received image. Then, the machine generates, based on the evaluation of the received image, an output that references the received image and indicates the likelihood of a user engaging in the desired behavior.
A large synthetic 3D human body model dataset using real-world body size distributions is created. The model dataset may follow real-world body parameter distributions. Depth sensors can be integrated into mobile devices such as tablets, cellphones, and wearable devices. Body measurements for a user are extracted from a single frontal-view depth map using joint location information. Estimates of body measurements are combined with local geometry features around joint locations to form a robust multi-dimensional feature vector. A fast nearest-neighbor search is performed using the feature vector for the user and the feature vectors for the synthetic models to identify the closest match. The retrieved model can be used in various applications such as clothes shopping, virtual reality, online gaming, and others.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
Apparatus and method for performing accurate text recognition of non- simplistic images (e.g., images with clutter backgrounds, lighting variations, font variations, non-standard perspectives, and the like) may employ a machine-learning approach to identify a discriminative feature set selected from among features computed for a plurality of irregularly positioned, sized, and/or shaped (e.g., randomly selected) image sub-regions.
An image is passed through an image identifier to identify a coarse category for the image and a bounding box for a categorized object. A mask is used to identify the portion of the image that represents the object. Given the foreground mask, the convex hull of the mask is located and an aligned rectangle of minimum area that encloses the hull is fitted. The aligned bounding box is rotated and scaled, so that the foreground object is roughly moved to a standard orientation and size (referred to as calibrated). The calibrated image is used as an input to a fine-grained categorization module, which determines the fine category within the coarse category for the input image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method to facilitate finding complementary recommendations are disclosed herein. One or more fashion trend or pleasing color combination rules are determined based on data obtained from one or more sources. One or more template images and rule triggers corresponding to the fashion trend or pleasing color combination rules are generated, each of the rule triggers associated with at least one of the template images. A processor compares a first image attribute of a particular one of the template images to a second image attribute of each of a plurality of inventory images corresponding to the plurality of inventory items to identify the inventory items complementary to the query image. The particular one of the template images is selected based on the rule trigger corresponding to the particular one of the template images being applicable for a query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
Electronic content that has a tactile dimension when presented on a tactile-enabled computing device may be referred to as tactile-enabled content. A tactile-enabled device is a device that is capable of presenting tactile-enabled content in a manner that permits a user to experience tactile quality of electronic content. In one example embodiment, a system is provided for generating content that has a tactile dimension when presented on a tactile-enabled device.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method to facilitate finding recommendations for a query image are disclosed herein. A color histogram is determined corresponding to the query image. Determining at least one of a visual pattern included in the query image, a dominant color of the query image, or an orientation histogram c01Tesponding to the query image. Performing comparison of a first image attribute of the query image to a second image attribute of an inventory image corresponding to an inventory item, wherein the first image attribute used in the comparison is selected fhm1 among the color histogram, the dominant color, and the orientation histogram. The selection of the first image attribute is based on a confidence score associated with the visual pattern, the dominant color, or a directionality present in the query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
Apparatus and method for providing contextual recommendations based on user state are disclosed herein. In some embodiments, sensor data corresponding to at least one sensor included in an item worn by a user is received. A user state is determined based on the received sensor data. In response to a state change being satisfied by at least the user state, a recommendation is determined based on the user state and a profile associated with the user. The recommendation may be presented on an electronic mobile device associated with the user.
A method, system, and article of manufacture for recommending items for a room. An image of a room is received, a box image is fitted to the image of the room. Information is extracted from the fitted box image and is used for recommending items for the room. The image is a color image and extracting information is done by extracting color histograms from the fitted box image. The color histograms are used to determine items that match the color scheme of the room, the lighting of the room, and/or the decorating style of the room.
A machine may be configured to perform image evaluation of images depicting items for sale and to provide recommendations for improving the images depicting the items to increase the sales of the items depicted in the images. For example, the machine accesses a result of a user behavior analysis. The machine receives an image of an item from a user device. The machine performs an image evaluation of the received image based on an analysis of the received image and the result of the user behavior analysis. The performing of the image evaluation may include determining a likelihood of a user engaging in a desired user behavior in relation to the received image. Then, the machine generates, based on the evaluation of the received image, an output that references the received image and indicates the likelihood of a user engaging in the desired behavior.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
In various example embodiments, a system and method for sketch based queries are presented. A sketch corresponding to a search item may be received from a user. At least a portion of the sketch may be generated by the user. An item attribute may be extracted from the sketch. The item attributed may correspond to a physical attribute of the search item. A set of inventory items similar to the search item may be identified based on the extracted item attribute and a search scope. The identified set of inventory items may be presented to the user.
An apparatus and method to facilitate finding complementary recommendations are disclosed herein. One or more fashion trend or pleasing color combination rules are determined based on data obtained from one or more sources. One or more template images and rule triggers corresponding to the fashion trend or pleasing color combination rules are generated, each of the rule triggers associated with at least one of the template images. A processor compares a first image attribute of a particular one of the template images to a second image attribute of each of a plurality of inventory images corresponding to the plurality of inventory items to identify the inventory items complementary to the query image. The particular one of the template images is selected based on the rule trigger corresponding to the particular one of the template images being applicable for a query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
During a training phase, a machine accesses reference images with corresponding depth information. The machine calculates visual descriptors and corresponding depth descriptors from this information. The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. After the training phase, the machine may perform depth estimation based on a single query image devoid of depth information. The machine may calculate one or more visual descriptors from the single query image and obtain a corresponding depth descriptor for each visual descriptor from the generated mapping. Based on obtained depth descriptors, the machine creates depth information that corresponds to the submitted single query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
Apparatus and method for performing accurate text recognition of non- simplistic images (e.g., images with clutter backgrounds, lighting variations, font variations, non-standard perspectives, and the like) may employ a machine-learning approach to identify a discriminative feature set selected from among features computed for a plurality of irregularly positioned, sized, and/or shaped (e.g., randomly selected) image sub-regions.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a pre-defined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
An apparatus and method to adjust item recommendations are disclosed herein. A first image attribute of a query image is compared to a second image attribute of each of a plurality of inventory images of a plurality of inventory items to identify the inventory items similar to the query image. Item recommendations comprising the identified inventory items in a first listing order are provided for display at a remote device. A second listing order of the identified inventory items is determined based on a user preference for a particular one of the identified inventory items. At least the second listing order is provided to the remote device for re-display of the item recommendations in accordance with the second listing order.
An apparatus and method to facilitate finding recommendations for a query image are disclosed herein. A color histogram is determined corresponding to the query image. Determining at least one of a visual pattern included in the query image, a dominant color of the query image, or an orientation histogram corresponding to the query image. Performing comparison of a first image attribute of the query image to a second image attribute of an inventory image corresponding to an inventory item, wherein the first image attribute used in the comparison is selected from among the color histogram, the dominant color, and the orientation histogram. The selection of the first image attribute is based on a confidence score associated with the visual pattern, the dominant color, or a directionality present in the query image.
An apparatus and method for obtaining image feature data of an image are disclosed herein. A color histogram of the image is extracted from the image, the extraction of the color histogram including performing one-dimensional sampling of pixels comprising the image in each of a first dimension of a color space, a second dimension of the color space, and a third dimension of the color space. An edge map corresponding to the image is analyzed to detect a pattern included in the image. In response to a confidence level of the pattern detection being below a predefined threshold, extracting from the image an orientation histogram of the image. And identify a dominant color of the image.
An apparatus and method to facilitate finding complementary recommendations are disclosed herein. One or more fashion trend or pleasing color combination rules are determined based on data obtained from one or more sources. One or more template images and rule triggers corresponding to the fashion trend or pleasing color combination rules are generated, each of the rule triggers associated with at least one of the template images. A processor compares a first image attribute of a particular one of the template images to a second image attribute of each of a plurality of inventory images corresponding to the plurality of inventory items to identify the inventory items complementary to the query image. The particular one of the template images is selected based on the rule trigger corresponding to the particular one of the template images being applicable for a query image.
An apparatus and method to adjust item recommendations are disclosed herein. A first image attribute of a query image is compared to a second image attribute of each of a plurality of inventory images of a plurality of inventory items to identify the inventory items similar to the query image. Item recommendations comprising the identified inventory items in a first listing order are provided for display at a remote device. A second listing order of the identified inventory items is determined based on a user preference for a particular one of the identified inventory items. At least the second listing order is provided to the remote device for re-display of the item recommendations in accordance with the second listing order.