Wednesday, October 18, 2017

AWS S3 Transferring Data Across Accounts

  Today I successfully transferred some data on AWS S3 from one account to another. 
  In the process I resolved an encryption related permission issue, which has little information on google given the misleading error message. 
  So I decided to write this down to share with people who need help.

Goal:
Copy data from one S3 bucket to another S3 bucket.

Resources:
 - source account: "src_account"
 - source bucket: "src_bucket"
 - destination account: "dst_account"
 - destination bucket: "dst_bucket"
 - an instance with AWS CLI installed (can be your laptop too)


Steps(high level):
 - created a user on destination account.
 - grant the account with permissions:
   - read from source bucket, using "resource" field & ARN.
   - write to destination bucket.
 - grant this user read access from source bucket, using bucket policy, "principal" field and ARN
 - (if encryption required) grant destination account access to encryption key on source account
 - (if encryption required) grant permission on user to use the key for:
   - decryption. required for reading.
   - encryption. required for writing.


Steps(detailed):

 - create a user on destination account: <sync_user>
  Keep the account key and secret to set up CLI.

 - under IAM on destination account, attach a policy to <sync_user> with these statements:
        {
            "Sid": "AllowReadSource",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<src_bucket>/*",
                "arn:aws:s3:::<src_bucket>"
            ]
        },
        {
            "Sid": "AllowWriteDestination",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3: PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<dst_bucket>/*",
                "arn:aws:s3:::<dst_bucket>"
            ]
        }

 - under <src_bucket> permission tab, attach statements to bucket policy:
        {
            "Sid": "AllowReadOnlyOnFileForUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<dst_account>:user/<sync_user>"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<src_bucket>/*"
        },
        {
            "Sid": "AllowReadOnlyOnDirectoryForUser",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<dst_account>:user/<sync_user>"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<src_bucket>/*"
        }

Additional steps if encryption is required for bucket:

 - On source account, add external account to each encryption key used.
  IAM -> Encryption Keys -> Choose the right region -> Add External Account -> <dst_account>

 - On destination account, add another policy with following statements:
        {
            "Sid": "AllowUseOfTheKey",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey*"
            ],
            "Resource": [
                "arn:aws:kms:<region>:<src_account>:key/<key_id>"
            ]
        }
 - add another statement to destination bucket in bucket policy:
        {
            "Sid": "Ensure config is encrypted on upload",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::<dst_bucket>/*",
            "Condition": {
                "StringNotLike": {
                    "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:<region>:<src_account>:key/<key_id>"
                }
            }
        }
CLI:
 - add the created <sync_user> to CLI as a profile:
Note: the region needs to match the region of the KMS key used. (if any)
  aws configure --profile <sync_user>

Command:
If files inside the bucket requires server-side encryption:
  aws s3 cp s3://<src_bucket> s3://<dst_bucket> --recursive --sse aws:kms --sse-kms-key-id arn:aws:kms:<region>:<src_account>:key/<key_id> --profile=<aws-cli-sync-user-profile>

otherwise:
  aws s3 cp s3://<src_bucket> s3://<dst_bucket> --recursive --profile=<aws-cli-sync-user-profile>


Summary:
  This is a quite simple process and some online documents do a better job explaining the steps than I just did. 
  I did attach some of my own understanding for the steps to help you understand why each step is needed. And the permissions I used are the absolutely minimum set of permissions to do such things. You can find templates that works with more permissions, but I feel it's not necessary to grant this user with more permissions than needed.
  I spent a lot of time dealing with the encryption permission issue, since it was nowhere documented, and the error is surfaced as another permission deny. Took me a long time to figure out what was causing the issue. So I really wish this helps if you run into similar issues.

Note: encryption is done on a per-file level, and can be heterogenous within a single bucket. If you run into permission error with "GetObject" action, double check the file that's causing the issue to see if it has encryption enabled.

Tuesday, May 30, 2017

Performance between BoneCP and HikariCP



I've been assessing HikariCP as a replacement for BoneCP for my server in the past week, and the result is somewhat surprising to me.
Sharing it here in case other people were doing the same thing.

The short conclusion is: BoneCP is slightly faster than HikariCP.

Test environment:
 - BoneCP version: 0.8.0-RELEASE
 - HikariCP version: 2.6.1
 - Tested on 2 groups of servers located on Amazon AWS, with DB servers in the same availability zone.
 - The only difference in the version of jar deployed is the connection pooling library difference (as well as the configuration difference)

The variation in configuration does not seem to make a big difference.

Tested by using the recommended configuration from each library, and for
1st test: trying to map the configuration by meaning
2nd test: match number of connections per host

Both tests give me the same result: the measured wall time for HikariCP is constantly 1~2ms slower than using BoneCP.

This is tested on a live product which has >100K concurrent users all the time and the range of tested queries has covered a few benchmark tests.

For faster database queries, this can be pretty significant: running an indexed select queries from a table costs 1ms on avg for BoneCP group but 2ms for HikariCP

Similarly this affects other queries including inserts & updates & deletes. Due to the range of the queries I have, the difference is 1~2ms.

After reading through this:
https://github.com/brettwooldridge/HikariCP/wiki/Pool-Analysis

I started to wonder if it's the validation overhead that's causing the performance difference.

And the awesome developer for HikariCP told me there are ways to configure that:
https://github.com/brettwooldridge/HikariCP/issues/900

So I did a 3rd & 4th test by:
- increase the window of validation check from 500ms to 5s
- override the test connection query from using jdbc isValid method to a simple "SELECT 1"

Unfortunately the result is the same, and the difference is roughly the same too.
So the connection test is not the culprit for performance different, at least on my environment.

Although I think the validation check is good and should be there, I've stopped at this point because I know BoneCP would probably be my go-to option given the performance result.

For now I'm unable to explain the performance different, and that's what needs to be updated. I'll dig the source code a bit further when I have some more time to spend on this.

Tuesday, May 7, 2013

Progressive Photon Mapper

During the past few weeks I've been trying to write a new tracer. After careful consideration I choose to implement a progressive photon mapping integrator within my own architecture, which I simplified and customized based on the one in PRBT book.

Right now I have only a simple scene to show the result of the integrator, later I'll focus more on the other part of the tracer (bsdf, weighting system, sampling, performance etc).

Here's a comparison image of the PPM integrator

First one has only one photon gathering pass, and second with 10 photon gathering pass. Each pass has 200K photons.

Direct lighting and indirect light are not decoupled, which makes the most mathematical sense to me.
pass 1

pass 10

Well I just realized this is not convincing enough. I should do a comparison between 200K photons per pass with 10 pass and 2M photon with 1 pass, and that would be my next post with other features added.


Monday, February 18, 2013

New demo reel

I made a new demo reel yesterday, adding the projects I've been working on recently into it.

Here's my new reel:



Thursday, February 14, 2013

the game: Penguin Planet

Last semester (Fall 2012) I took the CIS 568: Game Design Practicum, and I'm working with Nop for a game in Unity3D.

We both love arcade games, so made a game called "Penguin Planet", which is similar to the arcade game "Fill It".

We collaborated to design all the aspects of this game, and lots of technologies are involved in this game in order to implement all the features included. We like it a lot and we're proud of it.


Here's a demo for our game:



And here's the link for downloading our game:
http://dl.dropbox.com/u/122536698/Penguin%20Planet%20Game.rar

Hope you enjoy it!

Implemented accurate solid-fluid interaction for my FLIP solver

For the past few days I've been working on my fluid simulation project.
I've incoporated Christ Batty's Siggraph 2007 paper into this project.

Here's a demo about the result:

generate levelset for stanford bunny, apply fast poisson disk sampling method to get 128k particles.
grid size 100 cubic.
used anisotropic kernel for surface reconstruction. 
pretty much include everything I've done for fluid simulation, except for marching cube.

Monday, February 4, 2013

Triangle mesh to level set

Well I had the idea for this project since last summer, but I did not put it into practice until today.

Level set field data is extremely useful in all kinds of simulation, especially in fluid simulation. Also level set is a good interface for blue noise sampling technique.

Generating level set for a implicit surface is easy, all you have to do is to calculate the function value, and it's always somehow related to the minimum distance(signed).

However, things are not that easy when it comes to general case. You'll always be given a triangle mesh(obj file or ply file) as input. The problem for triangle mesh to level set is: triangle mesh is not continuous.

For a single point-triangle minimum distance, it's sometimes ambiguous how to determine the sign of the distance. For those points within the prism of the triangle, judging sign is easy, but for the those who's nearest point is on edge or vertex, it's hard to determine the sign.

So the idea I had is to calculate normal for all vertices(if not given from input) and all edges. I'm using a weighted sum for all surfaces normal related to the vertex/edge. The weight is the incident angle.

By using this method, the sign of a certain point to an arbitrary triangle is obvious and easy to compute.

No one would like to compute the signed distance for all sample points against all triangles. Two possible solutions:
1. using spacial subdivision data structure like KD tree. calculating signed distance for each sample points in a local region.
2. going the other way around. splat each triangle to a certain neighbor region, forming a narrow-band level set, and propagate the date to the whole field.

The second one is obviously faster but technically harder meanwhile. Since I've already spend a lot of time implementing fast sweeping, this approach fits me better. In fact it took me only a few hours to finish this method.

It cost 3.6s to calculate the level set data of a Stanford bunny using a 105*104*82 grid, running on single core laptop without any optimization. I'm pretty satisfied with the performance, since this conversion has to be done only once off-line.

Here's a demo showing the result of the level set. In order to show the correctness of the data, I shrink the whole field by a certain rate.